This is the port of Between-class Examples for Deep Sound Recognition to PyTorch. Dataset generation was taken from the original repo.
Implementation of Learning from Between-class Examples for Deep Sound Recognition by Yuji Tokozume, Yoshitaka Ushiku, and Tatsuya Harada (ICLR 2018).
This also contains training of EnvNet: Learning Environmental Sounds with End-to-end Convolutional Neural Network (Yuji Tokozume and Tatsuya Harada, ICASSP 2017).1
- Between-class (BC) learning
- We generate between-class examples by mixing two training examples belonging to different classes with a random ratio.
- We then input the mixed data to the model and train the model to output the mixing ratio.
- Training of EnvNet on ESC-50, ESC-10 [1], and UrbanSound8K [2] datasets
-
Template:
python main.py --dataset [esc50, esc10, or urbansound8k] --netType [envnet or envnetv2] --data path/to/dataset/directory/ (--BC) (--strongAugment)
-
Recipes:
-
Standard learning of EnvNet on ESC-50 (around 29% error2):
python main.py --dataset esc50 --netType envnet --data path/to/dataset/directory/
-
-
Notes:
- Please check opts.py for other command line arguments.
Between-class Learning for Image Clasification (github)
[1] Karol J Piczak. Esc: Dataset for environmental sound classification. In ACM Multimedia, 2015.
[2] Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. A dataset and taxonomy for urban sound research. In ACM Multimedia, 2014.