Giter Site home page Giter Site logo

dcase2019_task4's Introduction

DCASE2019 task4: Sound event detection in domestic environments (DESED dataset and baseline)

You can find discussion about the dcase challenge here: dcase_discussions. For more information about the DCASE 2019 challenge please visit the challenge website.

This task follows dcase2018 task4, you can find an analysis of dcase2018 task4 results here.

Detailed information about the baseline can be found on the dedicated baseline page.

If you use the dataset or the baseline, please cite this paper.

Updates

17th January 2020: adding public evaluation set, link to Desed, and change the format from csv to tsv to match Desed dataset.

6th march 2019: [baseline] add baseline/Logger.py, update baseline/config.py and update README to send csv files.

2nd May 2019: Removing duplicates in dataset/validation/test_dcase2018.csv and dataset/validation/validation.csv, changing eventbased results of 0.03%

19th May 2019: Updated the eval_dcase2018.csv and validation.csv. Problem due to annotation export. Files with empty annotations did have annotations.

28th May 2019: Updated evaluation dataset 2019.

31st May 2019: Update link to evaluation dataset (tar.gz) because of compression problem on some OS.

30th June 2019: [baseline] Update get_predictions (+refactor) to get directly predictions in seconds.

Dependencies

Python >= 3.6, pytorch >= 1.0, cudatoolkit=9.0, pandas >= 0.24.1, scipy >= 1.2.1, pysoundfile >= 0.10.2, librosa >= 0.6.3, youtube-dl >= 2019.4.30, tqdm >= 4.31.1, ffmpeg >= 4.1, dcase_util >= 0.2.5, sed-eval >= 0.2.1

A simplified installation procedure example is provide below for python 3.6 based Anconda distribution for Linux based system:

  1. install Ananconda
  2. launch conda_create_environment.sh`

Note: The baseline and download script have been tested with python 3.6, on linux (CentOS 7)

DESED Dataset

The Domestic Environment Sound Event Detection (DESED) dataset is composed of two subset that can be downloaded independently:

  1. (Real recordings) launch python download_data.py (in baseline/ folder).
  2. (Synthetic clips) download at : synthetic_dataset.
  3. (Public evaluation set: Youtube subset) download at: evaluation dataset. It contains 692 Youtube files.
  4. Synthetic evaluation set: Find information here to download them: Desed repo

It is likely that you'll have download issues with the real recordings. Don't hesitate to relaunch download_data.py once or twice. At the end of the download, please send a mail with the CSV files created in the missing_files directory. (in priority to Nicolas Turpault and Romain Serizel)

You should have a development set structured in the following manner:

dataset root
└───metadata			              (directories containing the annotations files)
│   │
│   └───train			              (annotations for the training sets)
│   │     weak.tsv                    (weakly labeled training set list and annotations)
│   │     unlabel_in_domain.tsv       (unlabeled in domain training set list)
│   │     synthetic.tsv               (synthetic data training set list and annotations)
│   │
│   └───validation			          (annotations for the test set)
│   │     validation.tsv                (validation set list with strong labels)
│   │     test_2018.tsv                  (test set list with strong labels - DCASE 2018)
│   │     eval_2018.tsv                (eval set list with strong labels - DCASE 2018)
│   │
│   └───eval			              (annotations for the public eval set (Youtube in papers))
│         public.tsv  
└───audio					          (directories where the audio files will be downloaded)
    └───train			              (audio files for the training sets)
    │   └───weak                      (weakly labeled training set)
    │   └───unlabel_in_domain         (unlabeled in domain training set)
    │   └───synthetic                 (synthetic data training set)
    │
    └───validation			                 
    └───eval		
        └───public                            

Synthetic data (1.8Gb)

Freesound dataset [1,2]: A subset of FSD is used as foreground sound events for the synthetic subset of the DESED dataset. FSD is a large-scale, general-purpose audio dataset composed of Freesound content annotated with labels from the AudioSet Ontology [3].

SINS dataset [4]: The derivative of the SINS dataset used for DCASE2018 task 5 is used as background for the synthetic subset of the dataset for DCASE 2019 task 4. The SINS dataset contains a continuous recording of one person living in a vacation home over a period of one week. It was collected using a network of 13 microphone arrays distributed over the entire home. The microphone array consists of 4 linearly arranged microphones.

The synthetic set is composed of 10 sec audio clips generated with Scaper [5]. The foreground events are obtained from FSD. Each event audio clip was verified manually to ensure that the sound quality and the event-to-background ratio were sufficient to be used an isolated event. We also verified that the event was actually dominant in the clip and we controlled if the event onset and offset are present in the clip. Each selected clip was then segmented when needed to remove silences before and after the event and between events when the file contained multiple occurrences of the event class.

License

All sounds comming from FSD are released under Creative Commons licences. Synthetic sounds can only be used for competition purposes until the full CC license list is made available at the end of the competition.

Real recordings (23.4Gb):

Subset of Audioset [3]. Audioset: Real recordings are extracted from Audioset. It consists of an expanding ontology of 632 sound event classes and a collection of 2 million human-labeled 10-second sound clips (less than 21% are shorter than 10-seconds) drawn from 2 million Youtube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds.

The download/extraction process can take approximately 4 hours. If you experience problems during the download of this subset please contact the task organizers.

Annotation format

Weak annotations

The weak annotations have been verified manually for a small subset of the training set. The weak annotations are provided in a tab separated csv file (.tsv) under the following format:

[filename (string)][tab][event_labels (strings)]

For example:

Y-BJNMHMZDcU_50.000_60.000.wav	Alarm_bell_ringing,Dog

Strong annotations

Synthetic subset and validation set have strong annotations.

The minimum length for an event is 250ms. The minimum duration of the pause between two events from the same class is 150ms. When the silence between two consecutive events from the same class was less than 150ms the events have been merged to a single event. The strong annotations are provided in a tab separated csv file (.tsv) under the following format:

[filename (string)][tab][event onset time in seconds (float)][tab][event offset time in seconds (float)][tab][event_label (strings)]

For example:

YOTsn73eqbfc_10.000_20.000.wav	0.163	0.665	Alarm_bell_ringing

Description

This task is the follow-up to DCASE 2018 task 4. The task evaluates systems for the large-scale detection of sound events using weakly labeled data (without timestamps). The target of the systems is to provide not only the event class but also the event time boundaries given that multiple events can be present in an audio recording. The challenge of exploring the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly annotated training set to improve system performance remains but an additional training set with strongly annotated synthetic data is provided. The labels in all the annotated subsets are verified and can be considered as reliable. An additional scientific question this task is aiming to investigate is whether we really need real but partially and weakly annotated data or is using synthetic data sufficient? or do we need both?

Further information on dcase_website

You can find the detailed results of dcase2018 task 4 to this page and this paper[6].

Authors

Nicolas Turpault, Romain Serizel, Justin Salamon, Ankit Parag Shah, 2019 -- Present

References

  • [1] F. Font, G. Roma & X. Serra. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 2013.

  • [2] E. Fonseca, J. Pons, X. Favory, F. Font, D. Bogdanov, A. Ferraro, S. Oramas, A. Porter & X. Serra. Freesound Datasets: A Platform for the Creation of Open Audio Datasets. In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017.

  • [3] Jort F. Gemmeke and Daniel P. W. Ellis and Dylan Freedman and Aren Jansen and Wade Lawrence and R. Channing Moore and Manoj Plakal and Marvin Ritter. Audio Set: An ontology and human-labeled dataset for audio events. In Proceedings IEEE ICASSP 2017, New Orleans, LA, 2017.

  • [4] Gert Dekkers, Steven Lauwereins, Bart Thoen, Mulu Weldegebreal Adhana, Henk Brouckxon, Toon van Waterschoot, Bart Vanrumste, Marian Verhelst, and Peter Karsmakers. The SINS database for detection of daily activities in a home environment using an acoustic sensor network. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 32–36. November 2017.

  • [5] J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello. Scaper: A library for soundscape synthesis and augmentation In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.

  • [6] Romain Serizel, Nicolas Turpault. Sound Event Detection from Partially Annotated Data: Trends and Challenges. IcETRAN conference, Srebrno Jezero, Serbia, June 2019.

dcase2019_task4's People

Contributors

oplatek avatar rserizel avatar tdiethe avatar turpaultn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dcase2019_task4's Issues

About Code

I'm sorry,I am not familiar with Pytorch, so I have a small issue I'd like to ask for your help.
In the train.py, there is a function"train",there has one code:for i, (batch_input, ema_batch_input, target) in enumerate(train_loader):I don't know how to understand the (batch_input, ema_batch_input, target),what kind of data do they stand for?

About consistency loss

Hi,
I'm confused about the comments in main.py line129, 139. They mention that the consistency loss only considers weak and unlabeled data. However, I can't figure out which code excludes strong data consistency. Besides, I find that the reference paper also takes strong data into consideration when computing consistency loss. There is another question I'm not sure. Is unlabeled data been used when training student model? Thanks!

audioread.NoBackendError

I've set the environment as mentioned in README, install all the dependencies, but when I run the download_data.py, and error, audioread.NoBackendError is occurred.

(dcase2019) root@e34e9af4fcc5:~/DCASE2019_task4/baseline# python download_data.py
 INFO - Download_data
 INFO -

Once database is downloaded, do not forget to check your missing_files


 INFO - You can change N_JOBS and CHUNK_SIZE to increase the download with more processes.
 INFO - Validation data
  4%|####2                                                                                                    | 21/516 [00:04<17:28,  2.12s/it] INFO - YU5udL6UCigk_100.000_110.000.wav
 INFO - list index out of range
WARNING: Unknown codec unknown
WARNING: Unknown codec unknown
WARNING: Unknown codec unknown
WARNING: Unknown codec unknown
  6%|######3                                                                                                  | 31/516 [00:08<12:58,  1.60s/it]multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/root/anaconda3/envs/dcase2019/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/root/anaconda3/envs/dcase2019/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "download_data.py", line 79, in download_file
    start=float(segment_start), stop=float(segment_end))
  File "/root/anaconda3/envs/dcase2019/lib/python3.6/site-packages/dcase_util/containers/audio.py", line 775, in load
    duration=duration
  File "/root/anaconda3/envs/dcase2019/lib/python3.6/site-packages/librosa/core/audio.py", line 119, in load
    with audioread.audio_open(os.path.realpath(path)) as input_file:
  File "/root/anaconda3/envs/dcase2019/lib/python3.6/site-packages/audioread/__init__.py", line 116, in audio_open
    raise NoBackendError()
audioread.NoBackendError

I have tried on windows 10 and Ubuntu 16.04.5 LTS, but I got the same error.
I have install ffmpeg, and I can use librosa to load my own .wav file with no error, so I think the error may caused by file corrupted.

Hope someone help me, please!

DATASET

I am sorry to say that the download process is always zero.So I'd like to ask for your help.Is there any other way to download the dataset except run .py file.Thank you

Memory leak, possibly AudioContainer

Hey guys,
as with last years script, I can't download the dataset, since this script uses more and more memory while downloading, independently of the number of threads.
I got a 16GB RAM machine here and the download script uses at the beginning ~2% and goes up ( after 300 utterances ) to ~10% and later just crashes the computer.
The memory fills slowly up when downloading, meaning that in order to download the dataset I need to run the script many times over.
It looks like the culprit is AudioConainer. For each download you create an AudioContainer object. It could be that internally some reference within AudioContainer is never freed, therefore memory never released. Moreover, the use of this object in this context ( just for copying and resampling a file ) is maybe a bit overkill. A simple change to librosa.load / librosa.output.write_wav solves the entire problem.

Other things that I noticed along the way in order to resolve this problem are:

  1. Missing files are (with errors) kept in memory up until everything finishes. This shouldn't be a big deal, but could simply fill up memory if too many errors happen.
  2. Returning lists, even though they are never used again. I would recommend using tuples, saves quite some memory for a lot of data.

Can i submit a pull request (change to librosa) for this issue?

EDIT:
Its not AudioContainer or any variable within the download function. After using memory_profiler for some time, I still dont have a clue. I suggest maybe some of the logs is filled up.

Youtube-dl error: ""token"" parameter not in video info for unknown reason

If you have this error many times in your "missing_files_[dataset].csv" files, please upgrade youtube-dl to youtube-dl-2019.4.30

ERROR example:
Y0_K6OKtoBBU_30.000_40.000.wav "ERROR: 0_K6OKtoBBU: ""token"" parameter not in video info for unknown reason; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output."

About train

If I want to train from scratch,what should the 'subpart_data' params in main.py choose? Thank you

Error when launching baseline/download_data.py

There seems to be a problem in download_data.py: after launching it from the baseline folder, the following error appears:

INFO - Download_data
 INFO -
Once database is downloaded, do not forget to check your missing_files

INFO - You can change N_JOBS and CHUNK_SIZE to increase the download with more processes.
INFO - Validation data
 0%|                                                                                                                                                               | 0/1168 [00:00<?, ?it/s]/home/a3lab/anaconda3/envs/dcase2019/lib/python3.6/site-packages/librosa/core/audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn('PySoundFile failed. Trying audioread instead.')
[E] AudioContainer: Stop parameter exceeds file length [tmp/0lNm9pXXHxA.m4a]    (audio.py:779)
NoneType: None
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/a3lab/anaconda3/envs/dcase2019/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/a3lab/anaconda3/envs/dcase2019/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "download_data.py", line 79, in download_file
    start=float(segment_start), stop=float(segment_end))
  File "/home/a3lab/anaconda3/envs/dcase2019/lib/python3.6/site-packages/dcase_util/containers/audio.py", line 780, in load
    raise IOError(message)
OSError: AudioContainer: Stop parameter exceeds file length [tmp/0lNm9pXXHxA.m4a]
"""[D] Y1easzswv0OM_130.000_140.000.wav

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "download_data.py", line 215, in <module>
    base_dir_missing_files=base_missing_files_folder)
  File "download_data.py", line 155, in download
    for val in tqdm(p.imap_unordered(download_file_alias, filenames, chunk_size), total=len(filenames)):
  File "/home/a3lab/anaconda3/envs/dcase2019/lib/python3.6/site-packages/tqdm/std.py", line 1108, in __iter__
    for obj in iterable:
  File "/home/a3lab/anaconda3/envs/dcase2019/lib/python3.6/multiprocessing/pool.py", line 347, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/home/a3lab/anaconda3/envs/dcase2019/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
OSError: AudioContainer: Stop parameter exceeds file length [tmp/0lNm9pXXHxA.m4a]
  0%|                                                                                                                                                               | 0/1168 [00:01<?, ?it/s]

I am using the Anaconda environment created with conda_create_environment.sh on Ubuntu 18.04.1 LTS, with youtube-dl 2020.3.24, and dcase-util 0.2.11.

UserWarning: PySoundFile failed. Trying audioread instead.

Downloading real recording raises these warnings & it doesn't even downloads all the files

/home/udaylunawat/anaconda3/envs/dcase2019/lib/python3.6/site-packages/librosa/core/audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.