Giter Site home page Giter Site logo

zmoth / wespeaker Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wenet-e2e/wespeaker

0.0 0.0 0.0 731 KB

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

License: Apache License 2.0

Shell 2.82% C++ 11.59% Python 83.20% Perl 0.90% CMake 1.49%

wespeaker's Introduction

WeSpeaker

License Python-Version

Roadmap | Docs | Paper | Runtime | Pretrained Models | Huggingface Demo | Modelscope Demo

WeSpeaker mainly focuses on speaker embedding learning, with application to the speaker verification task. We support online feature extraction or loading pre-extracted features in kaldi-format.

Installation

Install python package

pip install git+https://github.com/wenet-e2e/wespeaker.git

Command-line usage (use -h for parameters):

$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt
$ wespeaker --task embedding_kaldi --wav_scp wav.scp --output_file /path/to/embedding
$ wespeaker --task similarity --audio_file audio.wav --audio_file2 audio2.wav
$ wespeaker --task diarization --audio_file audio.wav

Python programming usage:

import wespeaker

model = wespeaker.load_model('chinese')
embedding = model.extract_embedding('audio.wav')
utt_names, embeddings = model.extract_embedding_list('wav.scp')
similarity = model.compute_similarity('audio1.wav', 'audio2.wav')
diar_result = model.diarize('audio.wav')

Please refer to python usage for more command line and python programming usage.

Install for development & deployment

  • Clone this repo
git clone https://github.com/wenet-e2e/wespeaker.git
  • Create conda env: pytorch version >= 1.12.1 is recommended !!!
conda create -n wespeaker python=3.9
conda activate wespeaker
conda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
pre-commit install  # for clean and tidy code

๐Ÿ”ฅ News

Recipes

  • VoxCeleb: Speaker Verification recipe on the VoxCeleb dataset
    • ๐Ÿ”ฅ UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving 2.627% (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.
    • ๐Ÿ”ฅ UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving 0.447%/0.043 EER/mindcf on vox1-O-clean test set
    • ๐Ÿ”ฅ UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems
      • EER/minDCF on vox1-O-clean test set are 0.723%/0.069 (ResNet34) and 0.728%/0.099 (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm
  • CNCeleb: Speaker Verification recipe on the CnCeleb dataset
    • ๐Ÿ”ฅ UPDATE 2022.10.31: 221-layer ResNet achieves 5.655%/0.330 EER/minDCF
    • ๐Ÿ”ฅ UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 report slides
      • EER/minDCF reduction from 8.426%/0.487 to 6.492%/0.354 after large margin fine-tuning and AS-Norm
  • NIST SRE16: Speaker Verification recipe for the 2016 NIST Speaker Recognition Evaluation Plan. Similar recipe can be found in Kaldi.
    • ๐Ÿ”ฅ UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.
  • VoxConverse: Diarization recipe on the VoxConverse dataset

Discussion

For Chinese users, you can scan the QR code on the left to follow our offical account of WeNet Community. We also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.

Citations

If you find wespeaker useful, please cite it as

@inproceedings{wang2023wespeaker,
  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Looking for contributors

If you are interested to contribute, feel free to contact @wsstriving or @robin1001

wespeaker's People

Contributors

jijijiang avatar cdliang11 avatar czy97 avatar robin1001 avatar wsstriving avatar hunterhuan avatar wd929 avatar xx205 avatar pengzhendong avatar p1ping avatar slyne avatar radygyd avatar shanguanma avatar underdogliu avatar kakashidan avatar dependabot[bot] avatar manipopopo avatar querryton avatar srdfjy avatar zuowanbushiwo avatar zmoth avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.