The purpose of this project is to capture and analyse MIDI data in real time and use it to enhance live improvised performance. The data includes both note-on/note-off events as well as any changes that might be taking place in MIDI-enabled effects units (for example, values coming from an LFO filter knob on a synth, or a level change on a guitar pedal). All data is recorded, timestamped, and analysed using real-time deep learning models. These models are used to create new MIDI data which can be fed back to performers as musical stimulus during improvisation.
The goal is not to automate performance or simulate human improvisers. Instead, it is to enhance the improvisational environment by allowing performers to interact with latent musical structures as they emerge — a real-time feedback loop between musician, data, and sound.
This is a multidisciplinary project combining improvisation practice, live sound design, computational modelling, and symbolic music representation. It explores how performers can engage with high-resolution, continuously evolving musical data, creating a form of deep listening in which the system becomes a responsive participant in the musical dialogue.
The name Deep Listening references Pauline Oliveros’ lifelong work exploring expanded listening practices and the creative, relational potential of sound. This project extends that lineage into real-time algorithmic improvisation.
Below is a consolidated selection of foundational texts and contemporary research relevant to the Deep Listening project. These works span deep listening practice, computational creativity, real-time MIR, and live human–machine improvisation.
Oliveros, P. (2005). Deep Listening: A Composer’s Sound Practice. iUniverse.
Oliveros provides the foundational philosophical framework for this project: listening as an active, relational, and ethical practice rather than passive hearing. Her emphasis on attentiveness, reciprocity, and co-presence directly informs how algorithmic agents in performance should behave—when to respond, when to hold back, and how to maintain musical agency and respect for human partners.
Fiebrink, R. (2011). Real-Time Human Interaction with Supervised Learning Algorithms for Music Composition and Performance. Princeton University.
Fiebrink’s thesis is a landmark in interactive machine learning for musicians. The work demonstrates how performers can actively train, refine, and reshape ML models during performance, enabling adaptive mappings and personalised behaviours. This directly underpins your Deep Listening system’s design ethos: systems should learn with the performer, not just before the performance.
Lartillot, O., Toiviainen, P., & Eerola, T. (2008). A MATLAB Toolbox for Music Information Retrieval. Data Analysis, Machine Learning and Applications.
Offers the conceptual foundation for MIR feature extraction. Although your system uses real-time implementations, the vocabulary of features (onsets, chroma, spectral flux, centroids) and their musical interpretations traces back to this lineage.
Bown, O., & Martin, A. (2019). Machine Listening: Improvisation with Algorithms. Organised Sound.
Explores algorithmic improvisers as co-creative agents, offering concrete design considerations for making systems “feel musical.” Emphasizes evaluation, responsiveness, and human-centred interaction—all central to Deep Listening.
Collins, N. (2008). The Analysis of Generative Music Programs. Organised Sound, 13(3).
Provides taxonomy and critical frameworks for generative systems, including autonomy levels, randomness control, and interpretive transparency. Helps contextualise how your system should generate events, respond to performers, and expose control to users.
Hawthorne, C., et al. (2018). Onsets and Frames: Dual-Objective Piano Transcription. ISMIR / arXiv.
Demonstrates high-fidelity symbolic representation derived from audio, forming the conceptual basis for real-time symbolic interaction. Onsets–Frames models are useful for detecting performer intent, expressive gesture, and timing information that can feed predictive models.
Choi, K., Fazekas, G., & Sandler, M. (2017). Towards Music Embeddings: Learning with Triplet Loss. ISMIR.
Shows how embedding spaces can encode short musical phrases for similarity, retrieval, or conditioning. Embeddings become essential tools in your system for:
• Real-time motif matching
• Predictive modelling
• Corpus-driven improvisation
Agres, K., & Herremans, D. (2020). Music and Artificial Intelligence: From Composition to Performance. Frontiers in Artificial Intelligence.
A state-of-the-art overview of AI-supported creative systems. This serves as the larger academic context in which your Deep Listening system situates itself—connecting ML, improvisation, and performance research.
Lewis, G. E. (2000). Too Many Notes: Computers, Complexity, and Culture in Voyager. Leonardo Music Journal.
The Voyager system is the historical prototype for fully autonomous improvising agents. Lewis’s framing—non-hierarchical human–machine interplay, distributed agency, and computational “musical personalities”—is essential theoretical grounding for any modern interactive system.
Wessel, D., & Wright, M. (2002). Problems and Prospects for Intimate Musical Control of Computers. Computer Music Journal.
Focuses on latency, gesture mapping, and multimodal control — topics central to designing responsive, expressive live systems. Their insights directly shape your system’s technical constraints (e.g., keeping round-trip latency below ~15–30 ms for ensemble-tight interactions).
Wright, M., & Freed, A. (1997). Open Sound Control: A New Protocol for Communicating with Sound Synthesizers. ICMC.
Outlines OSC’s advantages over MIDI: higher resolution, network friendliness, address hierarchies, and time-tagged bundles. These are essential in your setup where MIDI controllers, a Raspberry Pi, SuperCollider, and Python ML models communicate in real time.
The diagram below captures one possible setup of MIDI & audio routing currently used in this project. There are many possible variations; this is just the working environment used for data collection and live experimentation.
This project is supported by several interconnected codebases that work together to collect data, process it in real time, generate responsive musical output, and visualise emerging patterns.
1. Data collection and analysis (Python)
Link to related code
Python scripts using mido and python-rtmidi capture MIDI data with microsecond precision into an SQLite database. Deep learning models (PyTorch/TensorFlow) process streams to generate predictions, embeddings, or new musical material during live performance.
2. Real-time performance templates (SuperCollider)
Link to related code
SuperCollider receives interpreted data and controls synthesis, gesture responses, and algorithmic textures, enabling real-time interaction between performer and system.
3. Visualisation (Node.js)
Link to related code
JavaScript visualisers (D3.js, Three.js) display rhythmic lattices, gesture trajectories, and emergent structures. These connect directly to the Python processes for live rendering.
4. Jamulus (Network Collaboration)
Link to related code
Jamulus configurations enable distributed improvisation with low-latency feedback, integrating Deep Listening tools into remote performance setups.
Coming soon