johnmartinsson / adaptive-change-point-detection Goto Github PK

The official implementation of A-CPD from the paper "From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning".

License: Apache License 2.0

Python 96.89% Shell 3.11%

active-learning annotation bioacoustics machine-learning sound-event-detection

adaptive-change-point-detection's People

Contributors

Stargazers

Watchers

adaptive-change-point-detection's Issues

Cleanup code: data generation

Make the data generation easier, using only one script to generate soundscapes and embeddings with the default settings.

Normalize embedding space

Make the training data for the ProtoNet zero mean and unit variance.

Profile code and find computation bottlenecks

Profile the main parts of the code and find bottlenecks
Judge if bottlenecks can be solved
Solve bottlenecks

Fix visualization: histogram plots

Histogram plots should contain the same number of bins so that they are more easily comparable.

Idea: improve embedding time resolution

It may be feasible to improve the time resolution of the embeddings by simply zeroing out the first 1 second and last 1 second of a 3 second segment to get an embedding for only 1 second.

Try this for the BirdNET-Analyzer and see what happens.

Parallelize the simulation code

The experiments could run in parallel. Would go ~8x faster.

Foreground sounds in background sounds?

It seems the ME dataset may not be properly annotated. There appears to be foreground sounds that appear in the background sounds.

This may become an issue for the active learning approach. Since this will result in a faulty oracle, the active learning method may end up increasing label noise in the annotated dataset, which will be detrimental to the model performance.

Choose a set of different known background noises. E.g., 30 seconds of rain, wind, traffic, urban city, et cetera.

No probas before 1.5 seconds

The BirdNET embeddings start at 1.5 seconds, meaning that we have not probability estimate before this, and can therefore not detect peaks at the beginning of the file.

Problematic. Temporary solution is to generate soundscapes that never have events before 1.5 seconds.

Fix visualization: labels in histogram plots

Add label distributions in the histogram plots. E.g., two colors on the bars showing the distribution.

Cleanup code: data visualization

Move visualization code to separate script which produces the main figures in an efficient way.

Maybe move the AL simulation code to a separate script and store the most important results, which can then be loaded by the visualization scripts. Depends on how much time it requires.

Add description of data generation and embedding pre-computing to README

DRAFT.

Run experiments on modified data

Environment

Check the requirements.txt for the requirements. In particular we need:

- Scaper, and
- BirdNET-Analyser.

Scaper is used to generate the soundscapes using the source data, and BirdNET-Analyser is used to pre-compute the embeddings that the method works on.

Pre-compute the embeddings

TODO: last part of

scripts/generate_scaper_data.sh

Run simulations / experiments

TODO: update main script with proper default results folder

If everything is setup properly you should be able to run everything by simply writing:

    python main.py

this should run

active learning annotation simulation,
model training and prediction, and
evaluation of models trained with annotations.

Generate the dataset

TODO: describe how to generate the data in more detail.

Produce source files

TODO: add the doi:s and links to the datasets.

- NIGENS dataset,
- TUT Rare Events dataset,
- DCASE Few-shot bioacoustic dataset,

In the ``produce_source_material.py'' you need to set the correct data paths:

    tut_base_dir    = '<path>/TUT_rare_sed_2017/TUT-rare-sound-events-2017-development/data/source_data/'
    nigens_base_dir = '<path>/NIGENS/'
    dcase_base_dir  = '<path>/Development_Set/Validation_Set/ME/'

Generate audio recordings

TODO:

Extract the embeddings

TODO: explain how to generate the embeddings in more detail.

Setup the BirdNET-Analyzer v2.4 (https://github.com/kahst/BirdNET-Analyzer).

python embeddings.py --i ./scaper_source_files/BV/AMRE/train_soundscapes/ --o ./scaper_source_files/BV/AMRE/train_soundscapes/ --threads 8 --batchsize 16 --overlap 0
python embeddings.py --i ./scaper_source_files/BV/AMRE/train_soundscapes/ --o ./scaper_source_files/BV/AMRE/train_soundscapes/ --threads 8 --batchsize 16 --overlap 0

This will generate the embeddings for the train_soundscapes and the test_soundscapes and store them in the respective directory. Embeddings are for 3 second segments with the specified overlap.