Giter Site home page Giter Site logo

dr-costas / dnd-sed Goto Github PK

View Code? Open in Web Editor NEW
51.0 3.0 8.0 57 KB

Sound event detection with depthwise separable and dilated convolutions.

Home Page: https://arxiv.org/abs/2002.00476

License: Other

Python 100.00%
sound-event-detection depthwise-separable-convolutions dilated-convolution depthwiseseparableconvolution dilated-cnn audio-signal-processing machine-listening deep-learning deep-neural-networks machine-learning

dnd-sed's Introduction

Sound event detection with depthwise separable and dilated convolutions


Welcome to the repository of DnD-SED method.

This is the repository for the method presented in the paper "Sound Event Detection with Depthwise Separable and Dilated Convolutions", by K. Drossos, S. I. Mimilakis, S. Gharib, Y. Li, and T. Virtanen.

Our code is based on PyTorch framework and we use the publicly available dataset TUT-SED Synthetic 2016.

Our paper is submitted for review to the IEEE World Congress on Computational Intelligence/International Joint Conference on Neural Networks (WCCI/IJCNN).

You can find an online version of our paper at arXiv.

If you use our method, please cite our paper.


Table of Contents

  1. Method introduction
  2. System set-up
  3. Conducting the experiments

Method introduction

Methods for sound event detection (SED) are usually based on a composition of three functions; a feature extractor, an identifier of long temporal context, and a classifier. State-of-the-art SED methods use typical 2D convolutional neural networks (CNNs) as the feature extractor and an RNN for identifying long temporal context (a simple affine transform with a non-linearity is utilized as a classifier). This set-up can yield a considerable amount of parameters, amounting up to couple of millions (e.g. 4M) Additionally, the utilization of an RNN impedes the training process and the parallelization of the method.

With our DnD-SED method we propose the replacement of the typical 2D CNNs used as a feature extractor with depthwise separable convolutions, and the replacement of the RNN with dilated convolutions. We compare our method with the widely-used CRNN method, using the publicly available TUT-SED Synthetic 2016 dataset. We conduct a series of 10 experiments and we report mean values of time needed for one training epoch, F1 score, error rate, and amount of parameters.

We achieve a considerable decrease at the computational complexity and a simultaneous increase on the SED performance. Specifically, we achieve a reduction of the amount of parameters and the mean time needed for one training epoch (reduction of 85% and 72% respectively). Also, we achieve an increase of the mean F1 score by 4/6% and a reduction of the mean error rate by 3.8%.

You can find more information in our paper!


System set-up

To run and use our method (or simply repeat the experiments), you need to set-up the code and use the specific dataset. We provide you the full code used for the method, but you will have to get the audio files and extract the features.

Code set-up

To set-up the code and run our code, you will need to clone this repository and then install the dependencies using your favorite package manager. If you are using Conda, then you can do:

$ conda env create --yes --file conda_dependencies.yml

Then, an environment with the name dnd-sed will be created, using Python 3.7. If you prefer PIP, then you can do:

$ pip install -r pip_dependencies.txt

And you will be good to go! If anything is not working, please let me know by making an issue in this repository.

Data set-up

To set-up the data, you first have to follow the procedure and download the data from the corresponding web-page. Then, you should create your input/output values and use them with our method.

The code in this repository offers data handling functionality. The data_feders.get_tut_sed_data_loader function returns a PyTorch data loader, using as a dataset class the data_feders.TUTSEDSynthetic2016.

To use your extracted features with the class, you should have saved the features and the target values as separate files. You can specify the file names and the directory having these files in the settings files.


Conducting the experiments

In the settings directory you can find all the settings that were used for the results presented in the paper. We uses each settings file 10 times, and then we averaged the results. If you want to reproduce our results, then please remember to follow our procedure.

To run the code you just have to use the main.py script, passing the proper arguments. The needed arguments for running the main.py script are:

  • The name of the model that will be used, -m. Accepted values are:
    1. baseline -- This is the baseline, CRNN model.
    2. baseline_dilated -- This is the baseline model, but with the RNN replaced by a CNN with dilated convolution.
    3. dessed -- This is the baseline model, but with the CNNs replaced by depth-wise separable convolutions.
    4. dessed_dilated -- This is our proposed model, with depth-wise separable convolutions, followed by dilated convolution.
  • The name of the settings file to be used (without the extension .yaml): -c. For example, if the settings file synthetic_2016_k_55_d_1_1.yaml is to be used, then this argument has be synthetic_2016_k_55_d_1_1.

There are some optional arguments for the main.py script. These are:

  • The extension of the settings file, -e. Default value is .yaml.
  • The directory where the settings file is ,-d. Default value is settings.

Enjoy!

dnd-sed's People

Contributors

dr-costas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dnd-sed's Issues

Feature extraction

Hi,I have obtained the TUT-SED Synthetic 2016 database, but I have encountered some problems in the feature extraction section. I would like to know the specific process of audio feature extraction. I would be very grateful if you could show me your code.

No sigmoid on logits?

It seems like that metric functions are using direct logit values from Linear and calculate positive/negative by thresholding with 0.5, but I cannot find any sigmoid or similar variants that transforms logits into probability estimations. Am I missing something?

problems during the experiment

Thank you for providing such a good open source code, I am reproducing your experiment, the TUT-SED Synthetic 2016 link provided on github has no longer worked, so I downloaded the dataset on the official website. But there are only cpickle files and txt format comments in features, and there are no .npy files, do you need to convert cpickle files to npy? I've been confused for a long time because of this question, looking forward to your reply!

Using DessedDilated with custom dataset

Hey authors,

I was wondering how I could transfer the dessedDilated model to train with my pipeline using my custom datasets and dataloader. I have tried looking through files in the model directory, the settings files, _process.py etc. and I seem to be getting somewhere but its still not clear to me. What changes do I have to make to simply use the dessedDilated model?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.