Giter Site home page Giter Site logo

dcase2018task2's Introduction

Freesound-audio-tagging

DCASE2018 Task2 - General-purpose audio tagging of Freesound content with AudioSet labels

Kaggle - Freesound General-Purpose Audio Tagging Challenge

Citing

The article for this method can be downloaded here. Please cite this work in your pulications if it helps your research.

@article{xu2018general,
  title={General audio tagging with ensembling convolutional neural network and statistical features},
  author={Xu, Kele and Zhu, Boqing and Kong, Qiuqiang and Mi, Haibo and Ding, Bo and Wang, Dezhi and Wang, Huaimin},
  journal={arXiv preprint arXiv:1810.12832},
  year={2018}
}

What you can get from this repository?

  • Framework for audio-tagging or audio classification which based on PyTorch.

  • Audio data processing method and feature extraction method.

  • Encapsulation of multiple models for the audio data.

  • Advanced meta-learning method.

Data

Date could be downloaded from Kaggle competition Freesound Audio Tagging.

Requirments:

python 3.6

pytorch 0.4.0

cuda 9.1

librosa 0.5.1

torchvision 0.2.1

How to run?

Feature extraction.

python data_transform.py

This code can extract three types of features by selecting different functions:

  • Wave
  • Log-Mel
  • MFCC

Note: To extract different features, you need to set different parameters in config.

In order to speed up the extraction process, we use parallel computing, you could modify the number of threads according to your computer situation.

We extract log-mel and MFCC features, the delta and accelerate of log-mel and MFCC are calculated. Then we concatenate log-mel or MFCC with delta and accelerate to form a 3 x 64 x N dimension matrix where N depends on the length of audio files.

Before training, you should make directory to save the model.

Train on Wave.

python train_on_wave.py

To train the network directly from waveform.

Before run it, you should instantiate Class config to set parameters (such as directory, learning rate, batch size, epoch...). Make sure the data you are using is the wave feature you extracted earlier.

Train on Log-Mel

python train_on_logmel.py

To train the network from log-mel feature.

Make sure the data you are using is the log-mel feature you extracted earlier.

Train on MFCC

python train_on_logmel.py

To train the network from MFCC feature using the same code, but you should use the MFCC feature you extracted earlier.

Single Models

Several deep learning networks are encapsulated for sound data in the network_*.py, including:

  • Resnet
  • ResNeXt
  • SE-ResNeXt
  • DPN
  • Xception

Also, you can find useful pretrained models in this repository.

To be improved

  • More efficient and high-performance models to be designed.

  • Currently, the models are trained on single GPU. Multiple GPUs can be used for parallel training to accelerate learning.

dcase2018task2's People

Contributors

cocoxili avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.