Giter Site home page Giter Site logo

asr_deepspeech2's Introduction

ASR project

Installation guide

pip install -r ./requirements.txt

kenlm module might not be installed properly if there is no g++ installed beforehand. If there are problems with kenlm, follow these instructions:

sudo apt-get update
sudo apt-get install g++ -y
pip install https://github.com/kpu/kenlm/archive/master.zip

How to run test.py on pretrained models

The script load_model_checkpoints.py loads model checkpoints for English and Russian language from Google Drive. The loading of language models for both languages is happening in ctc_char_text_encoder.py in lines 67-68 and 77-78. An additional option was added to test.py: beam search can be turned on by specifying -bs command line option, otherwise it is not executed (for testing Russian model beam search can take up to one hour). Also, evaluating Russian model might not work because of the version conflict between torch_audiomentations requirements and requirements of this project, and because evaluation there is happening on mp3 files. For testing of Russian model to work one should comment the first line in the init file of wave augmentations.

python hw-asr/load_model_checkpoints.py
python hw-asr/test.py -r hw-asr/model_checkpoints/english/deep_speech_english_575.pth -c hw-asr/hw_asr/configs/test_config.json -bs
python hw-asr/test.py -r hw-asr/model_checkpoints/english/deep_speech_english_575.pth -c hw-asr/hw_asr/configs/test_other_config.json -bs
python hw-asr/test.py -r hw-asr/model_checkpoints/russian/deep_speech_russian_805.pth -c hw-asr/hw_asr/configs/russian_test_config.json -bs

The results of the evaluation will be printed to stdout, and they should be roughly equal to:

Datasets WER CER WER (bs) CER (bs)
libri-test-clean 25,3 8,9 17,7 7,3
libri-test-other 55 25,3 44,3 23,2
common-voice-russian-test 91 44,2 82,1 49,4

Training logs

Training of the English model consisted of two steps: firstly, the model was trained on LJ-dataset (the logs can be found here), and, secondly, the model was trained on Librispeech (part train-clean-360, the logs can be found here). Training logs for the Russian model can be found here.

asr_deepspeech2's People

Contributors

whiteteadragon avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.