Deep Improvisation Project

Overview

The purpose of this project is to capture and analyse MIDI data in real time and use it to enhance live improvised performance. The data includes both note-on/note-off events as well as any changes that might be taking place in MIDI enabled effects units (i.e. as values coming from a LFO Filter knob on a synth, or a level change on a guitar effects pedal). All data is recorded, timestamped, and analysed using real-time deep learning models. These models are used to create new MIDI data which is fed back to provide musical stimulus during performances.

This project does not seek to automate performances or to mimick the human ability improvise. Instead, the idea is to find ways to augment the improvisation experience for musicians in the context of live settings. When musicans play in ensemble settings, they are in a constant process of perceiving their own and others’ musical decisions and responding to that. This project is about finding ways to enhance that process by using real-time data that is produced by the musicians.

This is a multidiscipllinary project bringing together improvisation practice, real-tme sound design, and the use of advanced mathematics to analyse streaming data and discover latent patterns in data. The end goal is to find new ways to think about interactive creative performance, and reframe the MIDI protocol as a language that can assist creative interactions. It aims to enables a new kind of deep listening where performers can interact with emergent patterns in the data as it is created, and where data can become an active participant in the creative process.

The title of this project, Deep MIDI, is a nod to Pauline Oliveros' work in this field, and her committment to create sonic experiences from which participants (be they listeners or performers) can find new ways to explore and experience music.

Existing research projects using MIDI data

There are a number of research projects currently taking place Music Information Retrieval (MIR) which utilise MIDI data. Existing studies focus on pattern mining or modeling performance after the data has been been captured. Some of these are listed below.

Researcher / Group	Project / Dataset	Focus / Contribution	Reference (APA7)
Hawthorne, C. (Google Magenta)	MAESTRO Dataset	High-quality aligned audio–MIDI data from piano performances for transcription and generation tasks.	Hawthorne, C. et al. (2018). Enabling factorized piano music modeling and generation with the MAESTRO dataset. ISMIR.
Simon, I., Roberts, A. (Google Magenta)	PerformanceRNN / Magenta MIDI Dataset	Expressive performance modeling and real-time sequence generation using recurrent neural networks.	Oore, S. et al. (2018). This time with feeling: Learning expressive musical performance. Neural Computing and Applications.
Donahue, C. (UCSD)	MIDI-DDSP	Neural synthesis conditioned on MIDI control parameters for expressive sound generation.	Huang, R., & Donahue, C. (2021). MIDI-DDSP: Detailed control of musical performance via hierarchical modeling. ISMIR.
Raffel, C. (Columbia / Google)	Lakh MIDI Dataset	Large-scale collection of MIDI files aligned with Million Song Dataset for symbolic–audio mapping.	Raffel, C. (2016). Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching. PhD thesis, Columbia University.
Kim, J., & Bello, J. P. (NYU)	URMP Dataset	Synchronized audio–video–MIDI dataset of chamber ensembles for multi-modal music analysis.	Li, B., Kim, J., & Bello, J. P. (2018). URMP: A dataset for multimodal music performance analysis. ISMIR.
MIDI Toolbox (Eerola & Toiviainen)	MATLAB MIDI Toolbox	Early symbolic music analysis environment; basis for subsequent MIR dataset handling.	Eerola, T., & Toiviainen, P. (2004). MIDI Toolbox: MATLAB tools for music research. University of Jyväskylä.
Huang, C. A., & Yang, Y.-H. (Academia Sinica)	POP909 Dataset	909 full pop songs in aligned MIDI format for melody, harmony, and structure analysis.	Wang, Z. et al. (2020). POP909: A pop-song dataset for music arrangement generation. ISMIR.
Choi, K. et al. (Spotify Research)	GiantMIDI-Piano	Automated transcription of piano music into large-scale MIDI dataset for generative models.	Kong, Q. et al. (2020). GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music. ISMIR.

Background Reading

Some interesting background reading for this topic is listed below.

Deep Listening and Creative Awareness
Oliveros, P. (2005). Deep Listening: A Composer’s Sound Practice. Universe.
A foundational philosophy for this project, framing listening as a creative, meditative, and relational practice.
Real-Time Music Information Retrieval (MIR)
Fiebrink, R. (2011). Real-Time Human Interaction with Supervised Learning Algorithms for Music Composition and Performance (PhD Thesis, Princeton).
Explores adaptive, performer-in-the-loop systems for real-time learning — a precursor to the live analysis in Deep Listening.
Lartillot, O., et al. (2008). A MATLAB Toolbox for Music Information Retrieval. Proc. ISMIR.
Early example of feature extraction frameworks in MIR, forming the basis for real-time adaptation in creative contexts.
Computational Creativity and Collaboration
Bown, O., & Martin, A. (2019). Machine Listening: Improvisation with Algorithms. Organised Sound.
Discusses how algorithmic systems can participate in improvisation without deterministic control — central to Deep Listening’s feedback philosophy.
Collins, N. (2008). The Analysis of Generative Music Programs. Organised Sound.
Highlights the tension between generative autonomy and human co-creation; informs this project’s balance between analysis and performer agency.
Real-Time Pattern Recognition and Deep Learning
Hawthorne, C., et al. (2018). Onsets and Frames: Dual-Objective Piano Transcription. ISMIR.
Demonstrates deep neural architectures for high-resolution MIDI transcription — a model for real-time symbolic representation.
Choi, K., et al. (2017). Towards Music Embeddings: Learning with Triplet Loss. ISMIR.
Provides methods for embedding musical sequences in high-dimensional latent spaces — relevant for identifying emergent meta-sequences in MIDI streams.
Human–Machine Improvisation Systems
Agres, K., & Herremans, D. (2020). Music and Artificial Intelligence: From Composition to Performance. Frontiers in Artificial Intelligence.
Surveys how AI can engage with human musicians dynamically — a framing for real-time improvisation feedback models.
Lewis, G. E. (2000). Too Many Notes: Computers, Complexity, and Culture in Voyager. Leonardo Music Journal.
Describes the Voyager system, an early model for computer-human improvisation, philosophically aligned with Deep Listening’s interactive ethos.

Example architecture and related code

The diagram below captures one possible setup of MIDI & audio routing, that I am currently using. There are of course many ways to set this kind of thing up, this is a working example that is currently used to collect data for analysis.

                   ┌─────────────────────┐
                   │  Morningstar        │
                   │     MC8 Pro         │
                   │ 4 Omniports + 5-pin │
                   └───────┬─────────────┘
                           │
       ┌───────────────────┼──────────────────────────────────────────────┐
       │                   │                           │                  │
       ▼                   ▼                           ▼                  ▼
Blooper [Ring-Active MIDI IN]  Cloudburst [Standard Type A MIDI IN]  Mood MK II [Ring-Active MIDI IN]  Lost & Found [Ring-Active MIDI IN]

                           │
                           ▼
Timeline [5-pin MIDI IN] ──▶ MIDI THRU (5-pin) ──▶ GT-1000 Core [3.5mm MIDI IN] ──▶ MIDI THRU (3.5mm) ──▶ Arturia MicroFreak [3.5mm MIDI IN]

──────────────────────────────────────────────────────────────
Audio Path
──────────────────────────────────────────────────────────────
[Guitar] ──▶ LS-2 Input
             ├─A Output──▶ FX Loop 1 ──▶ GT-1000 Core ──▶ Return A (LS-2)
             │
             └─B Output──▶ Sonic Cake ABY Box ──▶ FX Loop 2 ─▶ Timeline ─▶ Cloudburst ─▶ Mood MK II ─▶ Source Audio EQ2 ─▶ Return B (LS-2)

Arturia MicroFreak/Octotrack/Korg Modwave ───────────┘ (joined via Sonic Cake ABY Box for FX Loop 2)

LS-2 Output ──▶ Blooper ─▶ AER Alpha Amp

This project is supported by a number of interconnected codebases that work together to collect data, undertake real-time analysis, feed data back to performers for further interaction, and even use in data visualisation. These repositories allow the system to capture performance data, run machine learning models live, render dynamic visual feedback, and enable distributed collaboration across networked musicians.

1. Data collection and data analysis
Link to related code
The purpose of this Python script is to manage all midi data collection messaging and real-time data management<. Built with mido and python-rtmidi, this environment listens continuously for incoming messages — including note, CC, and control data — and records them with microsecond precision into an SQLite database. Deep learning models (e.g., PyTorch, TensorFlow) can then process these streams to predict or generate new patterns, supporting live adaptive improvisation.

2. Real-time performance templates
Link to related code
The SuperCollider layer manages sound generation and performance interaction. It receives interpreted data from the Python process and responds with synthesized gestures, rhythmic patterns, and evolving textures. This setup enables non-linear improvisation structures, linking analytical and auditory domains in real time.

3. Visualisation
Link to related code
A Node.js environment provides interactive visual feedback using JavaScript libraries such as D3.js and Three.js. It displays rhythmic lattices, gesture maps, and timing densities, offering musicians a visual interface to explore the evolving structure of improvisations. The visualiser connects directly to the Python data stream for low-latency rendering.

4. Jamulus (Low-Latency Collaboration Config)
Link to related code
Configuration files define Jamulus-based network setups for online improvisation and rehearsal. These allow distributed musicians to connect to the Deep Improvisation environment with minimal latency, integrating the analytical and visual systems into remote collaborative performance.

Deep MIDI

Real-Time MIDI Pattern Analysis for Augmented Improvisation

Contents

Overview

Existing research projects using MIDI data

Background Reading

Example architecture and related code

Getting involved in this project