Giter Site home page Giter Site logo

hadryan / dcase-2020-task1a-code Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wanghelin1997/dcase-2020-task1a-code

0.0 2.0 0.0 93 KB

A pytorch implementation of the paper : Acoustic Scene Classification with Multiple Decision Schemes.

Python 98.78% Shell 1.22%

dcase-2020-task1a-code's Introduction

DCASE-2020-Task1A-Code

A Pytorch implementation of the paper : "Acoustic Scene Classification with Spectrogram Processing Strategies"

Paper results

Note that the test results are obtained only by the device a for the DCASE2020 Task1A dev set.

Model Accuracy Log loss Model size
DCASE2020 Task 1 Baseline 70.6% 1.356 19.1 MB
Log-Mel CNN 72.1% 0.879 18.9 MB
Gamma CNN 76.1% 0.762 18.9 MB
MFCC CNN 63.6% 1.029 18.9 MB
SPSMR 79.4% 0.696 75.5 MB
Log-Mel CNN + SPSMF 75.5% 1.135 94.4 MB
CQT CNN + SPSMF 74.5% 1.185 94.4 MB
Gamma CNN + SPSMF 78.8% 1.169 94.4 MB
MFCC CNN + SPSMF 60.9% 1.801 94.4 MB
SPSMR + SPSMF 80.9% 0.737 377.6 MB
Log-Mel CNN + SPSMT 74.5% 0.987 18.9 MB
CQT CNN + SPSMT 73.3% 1.032 18.9 MB
Gamma CNN + SPSMT 78.2% 0.866 18.9 MB
MFCC CNN + SPSMT 67.6% 1.081 18.9 MB
SPSMR + SPSMT 79.7% 0.701 75.5 MB
SPSMR + SPSMF + SPSMT 81.8% 0.694 453.1 MB

Citation

If this code is helpful, please feel free to cite the following papers:

@inproceedings{Wang2020,
    author = "Wang, Helin and Zou, Yuexian and Chong, DaDing",
    title = "Acoustic Scene Classification with Spectrogram Processing Strategies",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020)",
    address = "Tokyo, Japan",
    month = "November",
    year = "2020",
    pages = "210--214",
    abstract = "Recently, convolutional neural networks (CNN) have achieved the state-of-the-art performance in acoustic scene classification (ASC) task. The audio data is often transformed into two-dimensional spectrogram representations, which are then fed to the neural networks. In this paper, we study the problem of efficiently taking advantage of different spectrogram representations through discriminative processing strategies. There are two main contributions. The first contribution is exploring the impact of the combination of multiple spectrogram representations at different stages, which provides a meaningful reference for the effective spectrogram fusion. The second contribution is that the processing strategies in multiple frequency bands and multiple temporal frames are proposed to make fully use of a single spectrogram representation. The proposed spectrogram processing strategies can be easily transferred to any network structures. The experiments are carried out on the DCASE 2020 Task1 datasets, and the results show that our method could achieve the accuracy of 81.8\% (official baseline: 54.1\%) and 92.1\% (official baseline: 87.3\%) on the officially provided fold 1 evaluation dataset of Task1A and Task1B, respectively."
}

Acknowledgment

Thanks for the base code provided by https://github.com/qiuqiangkong/dcase2019_task1/.

@article{kong2019cross,
  title={Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems},
  author={Kong, Qiuqiang and Cao, Yin and Iqbal, Turab and Xu, Yong and Wang, Wenwu and Plumbley, Mark D},
  journal={arXiv preprint arXiv:1904.03476},
  year={2019}
}

Contact

If you have any problem about our code, feel free to contact

or describe your problem in Issues.

dcase-2020-task1a-code's People

Contributors

wanghelin1997 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.