vskadandale / instrument-recognition-polyphonic Goto Github PK

View Code? Open in Web Editor NEW

15.0 2.0 3.0 1.76 MB

Implementations for master thesis "Musical Instrument Recognition in Multi-Instrument Audio Contexts" with MedleyDB.

License: GNU General Public License v3.0

Python 100.00%

music-information-retrieval musical-instrument deep-learning cnn-keras

instrument-recognition-polyphonic's People

Contributors

Stargazers

Watchers

Forkers

shoegazerstella pizzashift

instrument-recognition-polyphonic's Issues

Dataset Folder Structure

Could you elaborate on how did you get the Dataset in the desired folder structure (as suggested in settings.py) ?
The MedleyDB dataset does not have any train/test split, but the settings.py tries to mention paths for
MEDLEY_TRAIN_FEATURE_BASEPATH, MEDLEY_TEST_FEATURE_BASEPATH.
The Dataset that I got from MedleyDB website has following structure:
V1
V1/sound_track_name
V1/sound_track_name/sound_track_name_RAW
V1/sound_track_name/sound_track_name_RAW/{multiple_RAW files}
V1/sound_track_name/sound_track_name_STEMS
V1/sound_track_name/sound_track_name_STEMS/{multiple_STEM files}
V1/sound_track_name/sound_track_name_MIX.wav

So you can see that I have only _RAW, _STEMS and _MIX wav files. The MedleyDB suggets the installation of a python library called 'medleydb' for annotations and metadata.

So can you tell whether we need to set the PATHS in settings.py before running data_prep.py ?
If so then how do we get the data set into desired structure ?
If not then I think that data_prep.py expects a different structure of dataset.

Conclusively I want to know what needs to been done between downloading the dataset and running data_prep.py ?

Thanks in advance

[Question] On window size and threshold

Hi,
Nice work you've done.
I am trying to reproduce some of the experiment also in conjunction with this paper you also mention in your work.

I have some questions about the window size.
In your thesis you say a window size of 3 seconds is preferred. Although you only experiment only up until 5 seconds. Did your experiment with a duration > 5 seconds were super bad or you simply decided not to try with such a configuration?

Another question about the threshold for the activation data. I am currently using a threshold of 0.5 to assign the label 1 to each instrument, as suggested in the paper.
According to your work, would you say that for a window of 5s a threshold of .4 or .45 would be more appropriate?

Thanks a lot!

vskadandale / instrument-recognition-polyphonic Goto Github PK

instrument-recognition-polyphonic's People

Contributors

Stargazers

Watchers

Forkers

instrument-recognition-polyphonic's Issues

Dataset Folder Structure

[Question] On window size and threshold

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent