Giter Site home page Giter Site logo

dbd-research-group / birdset Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 7.0 148.66 MB

A benchmark dataset collection for bird sound classification

Home Page: https://huggingface.co/datasets/DBD-research-group/BirdSet

License: BSD 3-Clause "New" or "Revised" License

Python 13.05% Jupyter Notebook 86.90% Shell 0.05%
avian benchmark bioacoustics deeplearning

birdset's Introduction

python PyTorch Lightning Config: Hydra GitHub: github.com/DBD-research-group/BirdSet arXiv

logo

Deep learning models have emerged as a powerful tool in avian bioacoustics to assess environmental health. To maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), models must analyze bird vocalizations across a wide range of species and environmental conditions. However, data fragmentation challenges a evaluation of generalization performance. Therefore, we introduce the $\texttt{BirdSet}$ dataset, comprising approximately 520,000 global bird recordings for training and over 400 hours PAM recordings for testing.

User Installation

The simplest way to install $\texttt{BirdSet}$ is to clone this repository and install it as an editable package using conda and pip:

conda create -n birdset python=3.10
pip install -e .

You can also use the devcontainer configured as as git submodule:

git submodule update --init --recursive

Or poetry.

poetry install
poetry shell

Reproduce Neurips2024 Baselines

First, you have to download the background noise files for augmentations

python resources/utils/download_background_noise.py

We provide all experiment YAML files used to generate our results in the path birdset/configs/experiment/birdset_neurips24. For each dataset, we specify the parameters for all training scenario: DT, MT, and LT

Dedicated Training (DT)

The experiments for DT with the dedicated subset can be easily run with a single line:

python birdset/train.py experiment="birdset_neurips24/DT/$Model"

Medium Training (MT) and Large Training (LT)

Experiments for training scenarios MT and LT are harder to reproduce since they require more extensive training times. Additionally, the datasets are quite large (90GB for XCM and 480GB for XCL). Therefore, we provide the best model checkpoints via Hugging Face in the experiment files to avoid the need for retraining. These checkpoints can be executed by running the evaluation script, which will automatically download the model and perform inference on the test datasets:

python birdset/eval.py experiment="birdset_neurips24/$EXPERIMENT_PATH"

As the model EAT is not implemented in Hugging Face transformer (yet), the checkpoints are available to download from the tracked experiments on Weights and Biases LT_XCL_eat.

If you want to start the large-scale trainings and download the big training datasets, you can also employ the XCM and XCL trainings via the experiment YAML files.

python birdset/train.py experiment="birdset_neurips24/$EXPERIMENT_PATH"

After training, the best model checkpoint is saved based on the validation loss and can then be used for inference:

python birdset/eval.py experiment="birdset_neurips24/$EXPERIMENT_PATH" module.model.network.local_checkpoint="$CHECKPOINT_PATH"

Example

Tutorial Notebook

Prepare Data

from birdset.datamodule.base_datamodule import DatasetConfig
from birdset.datamodule.birdset_datamodule import BirdSetDataModule

# initiate the data module
dm = BirdSetDataModule(
    dataset= DatasetConfig(
        data_dir='data_birdset/HSN', # specify your data directory!
        dataset_name='HSN',
        hf_path='DBD-research-group/BirdSet',
        hf_name='HSN',
        n_classes=21,
        n_workers=3,
        val_split=0.2,
        task="multilabel",
        classlimit=500,
        eventlimit=5,
        sampling_rate=32000,
    ),
)

# prepare the data (download dataset, ...)
dm.prepare_data()

# setup the dataloaders
dm.setup(stage="fit")

# get the dataloaders
train_loader = dm.train_dataloader()

Prepare Model and Start Training

from lightning import Trainer
min_epochs = 1
max_epochs = 5
trainer = Trainer(min_epochs=min_epochs, max_epochs=max_epochs, accelerator="gpu", devices=1)

from birdset.modules.multilabel_module import MultilabelModule
model = MultilabelModule(
    len_trainset=dm.len_trainset,
    task=dm.task,
    batch_size=dm.train_batch_size,
    num_epochs=max_epochs)

trainer.fit(model, dm)

Logging

Logs will be written to Weights&Biases by default.

Background noise

To enhance model performance we mix in additional background noise from downloaded from the DCASE18. To download the files and convert them to the correct format, run the notebook 'download_background_noise.ipynb' in the 'notebooks' folder.

Run experiments

Our experiments are defined in the configs/experiment folder. To run an experiment, use the following command in the directory of the repository:

python birdset/train.py experiment="EXPERIMENT_PATH"

Replace EXPERIMENT_PATH with the path to the disired experiment YAML config originating from the experiment directory. For example, here's a command for training an EfficientNet on HSN:

python bridset/train.py experiment="local/HSN/efficientnet.yaml"

Data pipeline

Our datasets are shared via HuggingFace Datasets in our BirdSet repository. First log in to HuggingFace with:

huggingface-cli login

For a detailed guide to using the BirdSet data pipeline and its many configuration options, see our comprehensive BirdSet Data Pipeline Tutorial.

Datamodule

The datamodules are defined in birdset/datamodule and configurations are stored under configs/datamodule. base_datamodule is the main class that can be inherited for specific datasets. It is responsible for preparing the data in the function prepare_data and loading the data in the function setup. prepare_data downloads the dataset, applies preprocessing, creates validation splits and saves the data to disk. setup initiates the dataloaders and configures data transformations.

The following steps are performed in prepare_data:

  1. Data is downloaded from HuggingFace Datasets _load_data
  2. Data gets preprocessed with _preprocess_data
  3. Data is split into train validation and test sets with _create_splits
  4. Length of the dataset gets saved to access later
  5. Data is saved to disk with _save_dataset_to_disk

The following steps are performed in setup:

  1. Data is loaded from disk with _get_dataset in which the transforms are applied

Transformations

Data transformations are referred to data transformations that are applied to the data during training. They include e.g. augmentations. The transformations are added to the huggingface dataset with set_transform.

birdset's People

Contributors

jonaslange avatar lurauch avatar mo01010010itz avatar moritz-wirth avatar raphaelschwinger avatar reheinrich avatar tom2208 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

birdset's Issues

Explore Baseline Models

Models for spectrograms:

  1. ConvNeXT: A pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. (https://huggingface.co/docs/transformers/model_doc/convnext#:~:text=ConvNeXT%20is%20a%20pure%20convolutional,art%20image )

  2. Swin Transformer: A hierarchical vision transformer, that employs shifted windows to compute representations. (https://huggingface.co/docs/transformers/model_doc/swin )

  3. Audio Spectrogram Transformer (AST): Applies a Vision Transformer to audio, by turning audio into an image (spectrogram) and obtains state-of-the-art results for audio classification. (https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer#:~:text=In%20this%20paper%2C%20we%20answer,50%2C%20and%2098.1%25 )

  4. ResNet: A convolutional neural network that employs residual connections to train much deeper models, facilitating the training of networks with an unprecedented number of layers (up to 1000). (https://huggingface.co/docs/transformers/model_doc/resnet#:~:text=ResNet%20Overview,5%E2%80%9D )

  5. EfficientNet: A model designed to uniformly scale the network width, depth, and resolution using a compound coefficient, which enables it to achieve state-of-the-art accuracy in image classification tasks while being smaller and faster than previous models (https://huggingface.co/docs/transformers/model_doc/efficientnet#:~:text=EfficientNet%20Overview,the%20paper%20is%20the%20following)

Models for waveforms:

  1. Wav2Vec2: Learns powerful representations from speech audio, by solving a contrastive task, which can then be fine-tuned on transcribed speech. (https://huggingface.co/docs/transformers/model_doc/wav2vec2#:~:text=Wav2Vec2%20Overview,on%20transcribed%20speech%20can ). There is also one Wav2Vec2 model already trained on bird sounds (https://huggingface.co/Saads/bird_classification_model#:~:text=,Training%20procedure%20Training%20hyperparameters )
    --> Nutzt eigenen Wav2Vec2FeatureExtractor

  2. Wav2Vec2-Conformer: Follows the same architecture as Wav2Vec2, but replaces the Attention-block with a Conformer-block (https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer )
    --> Nutzt AutoFeatureExtractor

  3. Hubert: An approach for self-supervised speech representation learning, which utilizes an offline clustering step to provide aligned target labels for a BERT-like prediction loss. (https://huggingface.co/docs/transformers/model_doc/hubert#:~:text=Hubert%20Overview,by%20three%20unique%20problems )
    --> Nutzt AutoFeatureExtractor

  4. (Whisper): Pre-trained model for automatic speech recognition (ASR) and speech translation, trained on a large dataset and capable of generalizing to many datasets and domains without the need for fine-tuning. (https://huggingface.co/docs/transformers/model_doc/whisper )
    --> WhisperFeatureExtractor extracts mel-filter bank features from raw speech

  5. UniSpeech: Learns speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner. (https://huggingface.co/docs/transformers/model_doc/unispeech#:~:text=UniSpeech%20is%20a%20speech%20model,to%20be%20decoded%20using%20Wav2Vec2CTCTokenizer )
    --> Nutzt AutoFeatureExtractor

Weitere mögliche Modelle für waveforms: Data2VecAudio, SEW, SEW-D, UniSpeechSat, WavLM (https://huggingface.co/docs/transformers/tasks/audio_classification#:~:text=The%20task%20illustrated%20in%20this,Conformer%E3%80%91%2C%20%E3%80%90114%E2%80%A0WavLM%E3%80%91%2C%20%E3%80%90115%E2%80%A0Whisper%E3%80%91 )

Create subtasks and train/test split

  • add subtask to hf DataGenerator script
  • Create train/test splits for said subtask (add some of the soundscape files to the training data, such that 80% is train and 20% is test and all classes are still in the test data)

Event Detection for all XC Files

Proposal from Raphael

  • Add a new column (e.g., Detected Events)
  • Xeno-Canto files with more than 5 seconds length get a list with detected/proposed events that can be used
  • Other time stamps are high-quality (from annotators)

Improve caching of datamodule

Currently when a hyperparameter that is not important for refilling the cache (e.g. the number of dataloader workers) is changed, the _fingerprint still changes. Therefore the resource-intensive preprocessing will be executed again.

Improve code-structure

Improve to code structure so that:

  • all source code is in /src (coding convention)
  • it aligns with lightning-hydra-template and yet-another-lightining-template
  • the package can be (in the future) added to pip
  • be now loaded easily in other repos through tools (e.g. poetry)

Todos

  • improve folder structure (see Readme.md)
  • make experiments runnable
    • add train.py from lightning-hydra-template. We can still use a custom train function, if needed, but we should use their config structure to be able to share code easily
    • remove feature_extractor form datamodule as it should be connected to the model
    • train resnet18 on esc50
    • log on wandb
    • finetune ast on esc50
    • finetune w2v2 on esc50
  • check out W&B
  • Write docs
    • How to run experiments
    • What to do for new experiments

sort metadata after licences

licence
//creativecommons.org/licenses/by-nc-nd/2.5/ - 67554
//creativecommons.org/licenses/by-nc-nd/3.0/ - 7379
//creativecommons.org/licenses/by-nc-nd/4.0/ - 118984
//creativecommons.org/licenses/by-nc-sa/3.0/ - 68896
//creativecommons.org/licenses/by-nc-sa/3.0/us/ - 1
//creativecommons.org/licenses/by-nc-sa/4.0/ - 415453
//creativecommons.org/licenses/by-nc/4.0/ - 128
//creativecommons.org/licenses/by-sa/3.0/ - 679
//creativecommons.org/licenses/by-sa/4.0/ - 6614
//creativecommons.org/licenses/by/4.0/ -199
//creativecommons.org/publicdomain/zero/1.0/ - 706

From CC, definitions:

Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.

Problems with publishing Datasets:

licence compatibility problem: Many of the CC-licenses are incompatible (see table). This can only be circumvented, by using multiple licenses for different parts of the dataset. Another question is about how to license the derivatives (i.e. the classification models we produce) when those contain multiple conflicting licenses.

Non-Derivative (nd): As stated under "Adapted Material", we are not allowed to translate, alter, arrange or transform the original data. I think it is impossible to comply with the Non-Derivative license, when uploading our data to Hugging-Face

Share-Alike (sa):

Possible Solutions

Instead of pushing to HF, just publish a script, that pulls the original data, converts it and adds it to the local HF-repository.
Fair Use under copyright law: When publishing models that were trained on the datasets there should not be any problems. This is because we made transformative changes (from a dataset to a classification model).

We could also ask the authors for permission.

Train EAT

Train EAT on HSN dataset

Todos

  • Interegate code
  • Run without augmentations on ESC50
  • train on M without augmentations, compare to ResNet
  • Integrate their augmentations
    • amp
    • neg
    • tshift
    • tmask: Not present in torch-audiomentations, adding it in our repo is difficult since we can not use torch-audiomentations test code -> forked the repo ; currently learning how to do the augmentation in parallel in the batch and on the gpu, I guess that's important for speed...
    • ampsegment
    • cycshift
  • augs_noise
  • augs_mix
  • Eval on HSN test

Xeno Canto files

  • Download all files

  • add ebird codes to all

  • Curate files and create train/test

  • Upload to Huggingface

  • download rest of xenocanto

  • add ebird codes

  • upload all/subset to huggingface

Naming convention for metadata file

  • Please provide an overview of metadata.csv files
  • There are a lot of files:
    • metadata_all
    • metadata
    • metadata_downloaded
    • metadata_more
    • ...
  • You could create this in a notion table so that it becomes clear what each metadata file represents
  • You may also delete some metadata files that we do not need anymore.

Add simple docs

TODO

  • README.md
  • base_datamodule
    • refactor to use @dataclass instead of DictConfig
    • test Hydra config by calling from main.py
    • test overriding from python without hydra (important for using as package)
    • comment base_datamodule
    • comment transforms
    • comment augmentations
    • comment resize
    • comment feature_extraction
    • notebook to test it
    • sync README and Notion page
  • base_module

Todos Meeting 25.08.2023

  • Definition Benchmark
  • Definition Task (Metainformationen und Regeln)
  • Was macht einen guten Benchmark Datensatz aus?
  • Task Ideen formulieren
  • Scope festlegen: wen wollen wir ansprechen?
  • Zeit Schedule + grober Plan überlegen
  • Mitschriften aufräumen und hinterlegen

Implement Event Detection (Bambird)

As discussed: try to add an event detection for the xeno-canto files. it should be incorporated into

  1. the feature extractor class so that it works with .map
  2. independent of the feature extractor so that it can be incorporated into the metadata file

Evaluation pipeline

Create an evaluation pipeline to:

  • load trained models from sharable source (git LFS ?) including their hyperparameters
  • test multiple models on a selected dataset
  • implement metrics
  • visualisation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.