instrument-recognition-polyphonic

This code is published as a part of my master's thesis for the program Sound and Music Computing (SMC) at UPF Barcelona. The manuscript is also made available for more information here.

Requirements:

MedleyDB (can be downloaded from here : http://medleydb.weebly.com/downloads.html)

For the deep learning part:

GPU
keras 1.2.2
librosa
numpy
pandas
sklearn

Organization of the respository:

The source code is organized as follows -

The ./settings.py contains all the paths to different folders relevant to the experiments that need to be set. The data_prep folder contains the code required to preprocess MedleyDB dataset. The models folder contains folders for deep-learning and traditional machine learning code.

Sequence of execution along with brief decription for each of the files:

Data Preprocessing

The source code for data preprocessing borrows heavily from Li et al. 2015. Please refer to their paper for more details regarding data preprocessing including steps to create the train/test split.

./data_prep/data_prep.py This file contains all scripts necessary for preparing data. Usage :

python data_prep.py -c window_configuration

The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted.
For example,

python data_prep.py -c _5s_h50

./data_prep/gen_split.py
This file splits the group of .mat files generated by running data_prep.py into 5 sets, each containing 20% of the samples. 4 of these sets are used a training set and the remaining one is used as test set. In this work, we have not done cross-validation, but since we have 5 such splits, it could be done.

Usage :

python gen_split.py -c window_configuration

The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted.
For example,

python gen_split.py -c _5s_h50

./data_prep/wav_generator.py
The wavfiles for each of the samples which were split using gen_split.py are generated using this code. Usage :

python wav_generator.py -c window_configuration

The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted.
For example,

python wav_generator.py -c _5s_h50

4)./data_prep/audio_transformer.py
This code splits an audio file into harmonic component and residual component using librosa and stores them separately. Usage :

python audio_transformer.py -c window_configuration

The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted.
For example,

python audio_transformer.py -c _5s_h50

for traditonal method:

1)./models/traditional/feature_extractor.py
This code makes use of Essentia's music extractor to extract temporal, spectral and cepstral features from the wav files and persist them. Usage :

python feature_extractor.py -c window_configuration -t dataset_type

The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted. The dataset_types are : {original, harmonic and residual}. For example,

python feature_extractor.py -c _5s_h50 -t original

2)./models/traditional/train_accumulator.py
This code aggregates datasets {0,1,2,3} and their respective labels into training set. Usage :

python train_accumulator.py -c window_configuration -t dataset_type

The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted. The dataset_types are : {original, harmonic and residual}. For example,

python train_accumulator.py -c _5s_h50 -t original

3)./models/traditional/test_accumulator.py
This code aggregates dataset_4 and their respective labels into test set. Usage :

python test_accumulator.py -c window_configuration -t dataset_type

The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted. The dataset_types are : {original, harmonic and residual}. For example,

python test_accumulator.py -c _5s_h50 -t original

4)./models/traditional/regressor.py
This code fits a regressor on training dataset and then predicts the instrument annotations for the test set. Usage :

python regressor.py -c window_configuration -t dataset_type

The window_configurations are : _{window_size}_h{percentage_hop} for which datasets have already been extracted. The dataset_types are : {original, harmonic and residual}. For example,

python regressor.py -c _5s_h50 -t original

for deep-learning:

Refer to ./models/deep-learning/README.me

shoegazerstella / instrument-recognition-polyphonic Goto Github PK