Tomi Illes

Master's final project

The extraction of musical instruments from bachata songs
using semantic segmentation for audio source separation

CNN / Semantic Segmentation / Meta-learning / Signal Processing / STFT / PyTorch

Could we teach a machine to separate the individual instruments within a bachata (genre of latin music) song, given its typical instruments, polyrhythms and section structure? In this project this is what we attempted to do. To test our hypothesis and model idea, first we studied a much simpler problem using the MNIST dataset: extract a number (e.g. 5) given that it is present in a mixture of numbers (i.e. 5,1,3,9 drawn on top of one another). After successfully solving this problem with high accuracy, using U-Net architecture and semantic-segmentation, we moved this model structure to our original, more complex but highly similar problem, using audio domain. Our dataset was a large collection of 4/8 beat-long-loops of individual instruments played at particular sections within bachata songs (e.g. intro, chorus, verse, bridge, outro). The preprocessing and data preparation phase included selecting, sorting and organising the right loops, followed by resampling and normalising the clips by shrinking/expanding them using Fast Fourier Transform so that they all have universal length/tempo (BPM) and can be used for ML purposes. The model had 3 input clips: a mixture, the ground truth (present in the mixture), and the target (same kind as but different to the target). Their spectrograms were produced using STFT and a heatmap and fed into our U-Net like CNN. Essentially our goal was to find a target-like pattern in the mixture and compute the loss by comparing it to the ground truth pattern. Having the mixture, target and ground truth clips chosen randomly, we could build our input data on the fly. Once we achieved high enough performance, we moved on to testing it on real bachata songs. Careful preprocessing was needed on real songs in order to feed them to our model: beat tracking to split a song into clips of 4 beats, resampling, stretching/shrinking, STFT and heatmap to produce the input spectrograms. The predicted clips were then stretched/shrank back to their original tempo and concatenated to form the full-song-length extracted instrument.

Further improvements to be done: implement a GAN for more defined predictions, possible residual blocks in the U-Net, training the model on original audio quality, examine the louder starting beat in the audio clips, estimate the downbeat locations instead of standard beat locations to resolve issues on songs that have intro.

RESULTS: NUMBER MODEL

RESULTS: AUDIO MODEL

examples	mixture	bongo		guira		bass		guitar
examples	mixture	ground truth	prediction	ground truth	prediction	ground truth	prediction	ground truth	prediction
example 1
example 2
example 3
example 4
example 5

RESULTS ON BACHATA SONGS

artist & song name	original	extractions / predictions				relative assessment
artist & song name	original	bongo	guira	bass	guitar	relative assessment
Prophex: Vanidosa						good
Lucho Panic: El Guardaespaldas						good
Prince Royce: Te robare						fair
Kiko Rodriguez: Vagabundo, borracho & Loco						poor

Master's final project

The extraction of musical instruments from bachata songs
using semantic segmentation for audio source separation

PROJECT PAPER (PDF)

POSTER (PDF)

CODE (github)

RESULTS: NUMBER MODEL

RESULTS: AUDIO MODEL

examples

mixture

bongo

guira

bass

guitar

RESULTS ON BACHATA SONGS

artist &
song name

original

extractions / predictions

relative
assessment

Master's final project

The extraction of musical instruments from bachata songs using semantic segmentation for audio source separation

PROJECT PAPER (PDF)

POSTER (PDF)

CODE (github)

RESULTS: NUMBER MODEL

RESULTS: AUDIO MODEL

examples

mixture

bongo

guira

bass

guitar

RESULTS ON BACHATA SONGS

artist &song name

original

extractions / predictions

relativeassessment

The extraction of musical instruments from bachata songs
using semantic segmentation for audio source separation

artist &
song name

relative
assessment