Giter Site home page Giter Site logo

speakerbeam-libriheavymix's Introduction

SpeakerBeam for neural target speech extraction

This repository contains an implementation of SpeakerBeam method for target speech extraction, made public during Interspeech 2021 tutorial.

The code is based on the Asteroid toolkit for audio speech separation.

Requirements

To install requirements:

pip install -r requirements.txt

The code was tested with Python 3.8.6.

Running the experiments

The directory egs contains a recipe for Libri2mix dataset. The recipe assumes that you already have the LibriMix dataset available. If not, please follow the instructions at the LibriMix repository to obtain it.

Before running our recipe, modify path.sh file to contain path to the repository root.

PATH_TO_REPOSITORY="<path-to-repo>"

Then follow the steps below in the recipe directory

cd egs/libri2mix

Preparing data

To prepare lists of the data run

local/prepare_data.sh <path-to-libri2mix-data>

The <path-to-libri2mix-data> should contain wav8k/min subdirectories. The command will create data directory containing csv lists describing the data. The preparation of the data follows the data preparation from Asteroid. In addition, it creates a list mapping mixtures to enrollment utterances.

Training SpeakerBeam

To train the SpeakerBeam model run

. ../../path.sh
python train.py --exp_dir exp/speakerbeam

The training script will by default use parameters in local/conf.yml. To run with different parameters, you can either change the local/conf.yml file or pass them directly as command-line arguments, e.g.

python train.py --exp_dir exp/speakerbeam_adaptlay15 --i_adapt_layer 15

to change the position of the adaptation layer in the network.

The training will create directory exp/speakerbeam. The final model after the training is finished is stored in exp/speakerbeam/best_model.pth. The training progress can be observed in Tensorboard using logs in exp/speakerbeam/lightning_logs.

Decoding and evaluating the performance

To extract target speech signals on the test set with the trained model and evaluate performance, run

python eval.py --test_dir data/wav8k/min/test --task sep_noisy --model_path exp/speakerbeam/best_model.pth --out_dir exp/speakerbeam/out_best --exp_dir exp/speakerbeam --use_gpu=1

It is also possible to evaluate with an intermediate checkpoint, e.g.

python eval.py --test_dir data/wav8k/min/test --task sep_noisy --model_path exp/speakerbeam/checkpoints/epoch\=24-step\=115824.ckpt --from_checkpoint 1 --out_dir exp/speakerbeam/out_e24_s115824 --exp_dir exp/speakerbeam --use_gpu=1

In the output directory exp/speakerbeam/out_best, you can find the averaged results in final_metrics.json and the extracted audio files in <out_dir>/out.

Reference

Please cite our works when using this code:

@ARTICLE{Zmolikova_Spkbeam_STSP19,
  author={Žmolíková, Kateřina and Delcroix, Marc and Kinoshita, Keisuke and Ochiai, Tsubasa and Nakatani, Tomohiro and Burget, Lukáš and Černocký, Jan},
  journal={IEEE Journal of Selected Topics in Signal Processing}, 
  title={SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures}, 
  year={2019},
  volume={13},
  number={4},
  pages={800-814},
  doi={10.1109/JSTSP.2019.2922820}}

@INPROCEEDINGS{delcroix_tdSpkBeam_ICASSP20,
  author={Delcroix, Marc and Ochiai, Tsubasa and Zmolikova, Katerina and Kinoshita, Keisuke and Tawara, Naohiro and Nakatani, Tomohiro and Araki, Shoko},
  booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam}, 
  year={2020},
  volume={},
  number={},
  pages={691-695},
  doi={10.1109/ICASSP40776.2020.9054683}}

speakerbeam-libriheavymix's People

Contributors

jinzr avatar zmolikova avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.