Giter Site home page Giter Site logo

emrai-synthetic-diarization-corpus's Introduction

Synthetic Diarization Corpus

Introduction

A synthetic corpus of dialogs was constructed from the LibriSpeech corpus, and is made freely available for diarization research. It includes over 90 hours of training data, and over 9 hours each of development and test data. Both 2-person and 3-person dialogs, with and without overlap, are included. Timing information is provided in several formats, and includes not only speaker segmentations, but also phoneme segmentations. As such, it is a useful starting point for general, particularly early-stage, diarization system development.

How to use

The corpus contains 4 top-level directories:
librispeech2: 2-person dialogs
librispeech2o: 2-person dialogs with overlap
librispeech3: 3-person dialogs
librispeech3o: 3-person dialogs with overlap


All sub-directories are "Kaldi table" data directories. Audio files are 16kHz PCM 16bit little-endian mono encoded.

Formats

ctm - each line is F C BT DUR word
Where:
F The waveform filename. NOTE: no pathnames or extensions are expected.
C Speaker.
BT The begin time (seconds) of the segment, measured from the start time of the file.
DUR The duration (seconds) of the segment.
labs - each line is a speaker id or 0 for pauses. One line corresponds 0.01 seconds of audio.
rttm0 - Rich Transcription Time Marked file format. Full specification can be found in Appendix A of "NIST's The 2009 (RT-09) Rich Transcription Meeting Recognition Evaluation Plan" paper.
rttm - merged rttm0, without pauses

This corpus is licensed under CC BY 4.0, but requires the following reference:

Edwards, E., Brenndoerfer, M., Robinson, A., Sadoughi, N., Finley, G. P., Korenevsky, M., Axtmann, N. & Suendermann-Oeft, D. (2018, September). A Free Synthetic Corpus for Speaker Diarization Research. In International Conference on Speech and Computer (pp. 113-122). Springer, Cham.

Bibtex

@inproceedings{edwards2018free,
  title={A Free Synthetic Corpus for Speaker Diarization Research},
  author={Edwards, Erik and Brenndoerfer, Michael and Robinson, Amanda and Sadoughi, Najmeh and Finley, Greg P and Korenevsky, Maxim and Axtmann, Nico and Miller, Mark and Suendermann-Oeft, David},
  booktitle={International Conference on Speech and Computer},
  pages={113--122},
  year={2018},
  organization={Springer}
}

Based on the LibriSpeech ASR corpus

emrai-synthetic-diarization-corpus's People

Contributors

brenndoerfer avatar gpfinley avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.