Giter Site home page Giter Site logo

bssd's Introduction

Blind Speech Separation and Dereverberation

This repository contains python/tensorflow code to reproduce the experiments presented in our paper Blind Speech Separation and Dereverberation using Neural Beamforming.

Requirements

The data loader uses the 'pyroomacoustics' package to generate artifical RIRs. Install with:

pip install pyroomacoustics

And the 'soundfile' package to read/write wavs:

pip install soundfile

To add your speech database, edit the 'path' keys in the configuration file:

nano -w experiments/shoebox_c2.json

You can also set the number of speakers, and the FFT length/shift for the frequency domain variant (BSSD-FD)

Preriquisites

Prior to training, a set of RIRs and DOA bases is required. Both sets will be stored as .mat containers in the 'data/' folder.

To generate a set of 720 artifical RIRs, use:

cd loaders
python rir_loader.py

To generate a set of 100 DOA bases, use:

cd loaders
python doa bases.py

Training

To train the frequency domain model (BSSD-FD), use:

cd experiments
python bssd_combined_fd.py

To train the time domain (BSSD-TD), use:

cd experiments
python bssd_combined_td.py

Due to the custom complex-valued layers, training for the FD takes roughly 8 times as long as for the TD model.

Validation

To validate the models, use:

cd experiments
python bssd_combined_td.py valid

This will generate a single prediction, plotting the SI-SDR and EER scores. Also, a matlab container containing the enhanced wavs will be written in the 'predictions/' folder.

To generate a spectrogram plot showing the mixture, and the separated and dereverberated estimates, use:

cd experiments
python bssd_combined_td.py plot

Performance

Mixture with 2 speakers, separated and dereverberated using the 'bssd_combined_td' model predicitons

SI-SDR and EER after 10^5 training epochs of the 'bssd_combined_td' model predicitons

False Accaptance and Rejection Rates for the 101 WSJ0 speakers of the 'bssd_combined_td' model predicitons

Citation

Please cite our work as

@misc{pfeifenberger2021blind,
      title={Blind Speech Separation and Dereverberation using Neural Beamforming}, 
      author={Lukas Pfeifenberger and Franz Pernkopf},
      year={2021},
      eprint={2103.13443},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

bssd's People

Contributors

rrbluke avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.