Giter Site home page Giter Site logo

jovan-stojanovic / animal-sound-recognition Goto Github PK

View Code? Open in Web Editor NEW
15.0 3.0 5.0 5.23 MB

Deep learning model for animal sound classification.

License: MIT License

Python 100.00%
deep-learning machine-learning sound-classification convolutional-neural-networks

animal-sound-recognition's Introduction

Animal sound recognition using deep learning techniques

This is a project for the master's degree (TIDE) at the Paris I Panthéon-Sorbonne University, Deep Learning course.

This repo contains code in Python for an application of the sound recognition techniques from this paper: PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition1 on animal sound recognition. The database used for this is Google Audioset, a big dataset of classified audio, from the Youtube-8M project, containing ”632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos” (see2). The idea was to apply the technique used in this repository to animal sound recognition and observe the results.

1. Download dataset

The dataset used for this project contains all classes from the Google AudioSet that describe animal sounds. Sounds were then packed to hdf5 format. All data were not included in this repo due to their important size.

2. Train

Three models were trained: CNN14, ResNet38 and Wavegram-Logmel-CNN.

3. Results

We have managed to show that models that are exclusively trained on animal sounds data provide better results than general purpose models. The best sound prediction model obtained so far on the AudioSet database, the Wavegram-Logmel-CNN, with a mean average precision (mAP) of 0.439, has been surpassed by our ResNet38 and CNN14 models trained on animal sounds with a mAP of 0.551 and 0.561 respectively.

image

Understandably, these models are trained on different data and their results may not be compared withease, especially the mAP, as showed in the class-wise analysis.

image

There is a structural class effect that needs to betaken into account, as some types of sounds are more complex to classify than others, which is unknown at the beginning. But, we have showed that the model which is best for general purpose training will not always be thesame that that which will be applied to a specific group of data from the same source.

Future research may focus on confirming or denying the above hypothesis we have made on training different groups compared to aggregates. Tuning the parameters and discovering the ones that fit well for this kind of problems would also be important. Other models may be tested, such as the MobileNetV1, that due to it’s lighter network may be possible to incorporate in a smartphone application.

Environments/Dependencies

The codebase is developed with Python 3.7. Install requirements as follows:

pip install -r requirements.txt

References


  1. Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley. "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition." (2019).

  2. Gemmeke et al., "Audio Set: An ontology and human-labeled dataset for audio events," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017, pp. 776-780, doi: 10.1109/ICASSP.2017.7952261.

animal-sound-recognition's People

Contributors

jovan-stojanovic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

animal-sound-recognition's Issues

FileNotFoundError: [Errno 2] No such file or directory: '/home/a/pycharmcode/Animal-sound-recognition-main/indices/unbalanced_train_segments_clean.csv'

/home/a/anaconda3/envs/as1/bin/python /home/a/pycharmcode/Animal-sound-recognition-main/dataset/download_and_pack.py
/home/a/anaconda3/envs/as1/lib/python3.7/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
Traceback (most recent call last):
File "/home/a/pycharmcode/Animal-sound-recognition-main/dataset/download_and_pack.py", line 208, in
split_unbalanced_csv_to_partial_csvs(args)
File "/home/a/pycharmcode/Animal-sound-recognition-main/dataset/download_and_pack.py", line 26, in split_unbalanced_csv_to_partial_csvs
with open(unbalanced_csv_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/a/pycharmcode/Animal-sound-recognition-main/indices/unbalanced_train_segments_clean.csv'

进程已结束,退出代码1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.