Introduction
As a high school student, I discovered my passion for DJing through creating mixtapes for my friends using keyboard hotkeys on my bulky 20-pound laptop. Fast-forwarding 12 years later, I’ve honed my skills in both digital DJing and vinyl turntablism, performed at countless events (weddings, bar mitzvahs, street-dance competitions, you name it), started a mix collective with friends (shoutout After School Marine Club), and opened for touring artists in downtown Philly. Throughout my journey, I’ve come to realize that DJing is a heuristic-based, almost algorithmic form of storytelling through music.
A simplified version of my mental DJ "algorithm" looks like this:
- Identify the audience and purpose of the event
- Initialize the first song based on the audience and previous experience
- Choose the next track based on rhythmic and tonal criteria (e.g., BPM, pitch, genre)
- Observe and adapt setlist based on feedback
Looking at these steps, it becomes clear that the art of DJing is much like a social science. By treating music as data, DJing becomes an application of data science in the domains of Music Information Retrieval (MIR) and Digital Signal Processing (DSP). Through this scientific, empirically motivated lens, we can study the patterns, structures, and emotional impact of songs, enabling us to create more engaging and meaningful experiences for listeners. My rationale for creating an AI DJ is not to put DJs out of business (although the thought of DJ.py headlining at Coachella is entertaining, to say the least). Rather, this project is my most recent form of curiosity, creative expression, and learning.
The Process
The current existing forms of automated DJing are more aligned with “playlist DJing” and typically consist of stitching the beginnings and endings of songs together through synchronizing downbeats. I am aiming for a system that can generate mixes similar to that of a live-performance DJ, where songs are strategically segmented and mixed to form a condensed, hedonic auditory experience (i.e., a good old-fashioned mixtape).
I’ve broken this process down into three parts:
- Automated Chorus Identification: Identifying the most engaging parts of songs.
- Algorithmic Song Selection: Using content-based and collaborative filtering to select songs based on their audio features and compatibility with the song on deck.
- Transition Engineering: Applying audio signal processing, ML, and music theory to define and optimally select various song transitions (e.g., high-pass filter fade for 8-meter phrases).
Dataset and Progress
My dataset consists of 332 songs compiled from various EDM playlists on Spotify, with chorus start and end locations hand-labeled. Details about the labeling process can be found in the Mixin Data Annotation Guide. I am currently nearing the end of the first stage, automated chorus identification, where I’ve developed a Convolutional Recurrent Neural Network (CRNN) to identify choruses in songs. I’m currently fine-tuning and experimenting with different ways to further improve the precision of the labels.
Stay tuned for more updates on the development of Mixin, including a detailed look at the automated chorus identification process in the next post!