Giter Site home page Giter Site logo

pybk's Introduction

pyBK - Speaker diarization python system based on binary key speaker modelling

The system provided performs speaker diarization (speech segmentation and clustering in homogeneous speaker clusters) on a given list of audio files. It is based on the binary key speaker modelling technique. Thanks to the in-session training of a binary key background model (KBM), the system does not require any external training data, providing an easy to run and tune option for speaker diarization tasks.

Description

This implementation is based on that of Delgado, which is also available for MATLAB. Besides the binary key related code, useful functions for a speaker diarization system pipeline are included. Extra details and functionalities were added, following our participation at EURECOM on the Albayzin 2016 Speaker Diarization Evaluation described here, the first DIHARD challenge, detailed in the Interspeech 2018 paper, and the IberSPEECH-RTVE Speaker Diarization Evaluation, explained here.

Installation

This code is written and tested in python 3.6 using conda. It relies on a few common packages to get things done:

If you are using conda:

$ conda create -n pyBK python=3.6
$ source activate pyBK
$ conda install numpy
$ conda install -c conda-forge librosa
$ pip install webrtcvad
$ git clone https://github.com/josepatino/pyBK.git

Example

Five files from the SAIVT-BNEWS database are included in order to test the system (all rights reserved to their respective owners). These comprise audio files in wav format, speech activity detection (SAD) and unpartitioned evaluation map (UEM) files obtained from the references. For a quick run:

$ cd pyBK
$ python main.py

In the case of not finding UEM files, the complete audio content will be considered. In the case of not finding VAD files, automatic VAD based in py-webrtvad will be applied. Automatic VAD may also be enforced in the config file.

System configuration is provided in the form of an INI configuration file, and comments are provided in the example config.ini file. To use this system on your data create a config file of your own and run:

$ python main.py yourconfig.ini

Finally, a config file following our DIHARD submission is also included. Note that this configuration is meant to be used with IIR-CQT Mel-frequency cepstral coefficients (ICMC) which can be replicated using MATLAB code available here.

Evaluation

The system will have generated a RTTM file which you can evaluate using the NIST md-eval script provided,

$ eval-tools/md-eval-v21.pl -c 0.25 -s out/[experiment_name].rttm -r eval-tools/reference.rttm

which should return a 5.32% diarization error rate (DER) using a standard 0.25s collar. By using the automatic VAD you should get a 10.04% DER. As per the DIHARD config file, when using ICMCs as features, this system returns a DER of 30.69% on the evaluation set, with a 0s collar.

Contact

Please feel free to contact me for any questions related to this code:

  • Jose Patino: patino[at]eurecom[dot]fr

Citation

If you use pyBK in your research, please use the following citation:

@inproceedings{patino2018,
  author = {Patino, Jose and Delgado, H{\'e}ctor and Evans, Nicholas},
  title = {{The EURECOM submission to the first DIHARD Challenge}},
  booktitle = {{Interspeech 2018, 19th Annual Conference of the International Speech Communication Association}},
  year = {2018},
  month = {September},
  address = {Hyderabad, India},
}

pybk's People

Contributors

josepatino avatar ishine avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.