Giter Site home page Giter Site logo

spitchlingware / icefall Goto Github PK

View Code? Open in Web Editor NEW

This project forked from k2-fsa/icefall

0.0 0.0 0.0 9.86 MB

Icefall fork to track minor changes related to Generic datasets

Home Page: https://k2-fsa.github.io/icefall/

License: Apache License 2.0

Shell 1.18% Python 98.79% Dockerfile 0.03%

icefall's Introduction

Introduction

icefall contains ASR recipes for various datasets using https://github.com/k2-fsa/k2.

You can use https://github.com/k2-fsa/sherpa to deploy models trained with icefall.

You can try pre-trained models from within your browser without the need to download or install anything by visiting https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition See https://k2-fsa.github.io/icefall/huggingface/spaces.html for more details.

Installation

Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.

Recipes

Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.

We provide the following recipes:

yesno

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We provide a Colab notebook for this recipe: Open In Colab

LibriSpeech

Please see https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md for the latest results.

We provide 5 models for this recipe:

Conformer CTC Model

The best WER we currently have is:

test-clean test-other
WER 2.42 5.73

We provide a Colab notebook to run a pre-trained conformer CTC model: Open In Colab

TDNN LSTM CTC Model

The WER for this model is:

test-clean test-other
WER 6.59 17.69

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: Open In Colab

Transducer: Conformer encoder + LSTM decoder

Using Conformer as encoder and LSTM as decoder.

The best WER with greedy search is:

test-clean test-other
WER 3.07 7.51

We provide a Colab notebook to run a pre-trained RNN-T conformer model: Open In Colab

Transducer: Conformer encoder + Embedding decoder

Using Conformer as encoder. The decoder consists of 1 embedding layer and 1 convolutional layer.

The best WER using modified beam search with beam size 4 is:

test-clean test-other
WER 2.56 6.27

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model: Open In Colab

k2 pruned RNN-T

Encoder Params test-clean test-other
zipformer 65.5M 2.21 4.91
zipformer-small 23.2M 2.46 5.83
zipformer-large 148.4M 2.11 4.77

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

k2 pruned RNN-T + GigaSpeech

test-clean test-other
WER 1.78 4.08

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

k2 pruned RNN-T + GigaSpeech + CommonVoice

test-clean test-other
WER 1.90 3.98

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

GigaSpeech

We provide two models for this recipe: Conformer CTC model and Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Conformer CTC

Dev Test
WER 10.47 10.58

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

Dev Test
greedy search 10.51 10.73
fast beam search 10.50 10.69
modified beam search 10.40 10.51

Aishell

We provide three models for this recipe: conformer CTC model, TDNN LSTM CTC model, and Transducer Stateless Model,

Conformer CTC Model

The best CER we currently have is:

test
CER 4.26

TDNN LSTM CTC Model

The CER for this model is:

test
CER 10.16

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: Open In Colab

Transducer Stateless Model

The best CER we currently have is:

test
CER 4.38

We provide a Colab notebook to run a pre-trained TransducerStateless model: Open In Colab

Aishell2

We provide one model for this recipe: Transducer Stateless Model.

Transducer Stateless Model

The best WER we currently have is:

dev-ios test-ios
WER 5.32 5.56

Aishell4

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)

The best CER we currently have is:

test
CER 29.08

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

TIMIT

We provide two models for this recipe: TDNN LSTM CTC model and TDNN LiGRU CTC model.

TDNN LSTM CTC Model

The best PER we currently have is:

TEST
PER 19.71%

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: Open In Colab

TDNN LiGRU CTC Model

The PER for this model is:

TEST
PER 17.66%

We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: Open In Colab

TED-LIUM3

We provide two models for this recipe: Transducer Stateless: Conformer encoder + Embedding decoder and Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Transducer Stateless: Conformer encoder + Embedding decoder

The best WER using modified beam search with beam size 4 is:

dev test
WER 6.91 6.33

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained Transducer Stateless model: Open In Colab

Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best WER using modified beam search with beam size 4 is:

dev test
WER 6.77 6.14

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

Aidatatang_200zh

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

Dev Test
greedy search 5.53 6.59
fast beam search 5.30 6.34
modified beam search 5.27 6.33

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

WenetSpeech

We provide some models for this recipe: Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss and Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset, offline ASR)

Dev Test-Net Test-Meeting
greedy search 7.80 8.75 13.49
modified beam search 7.76 8.71 13.41
fast beam search 7.94 8.74 13.80

Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)

Streaming:

Dev Test-Net Test-Meeting
greedy_search 8.78 10.12 16.16
modified_beam_search 8.53 9.95 15.81
fast_beam_search 9.01 10.47 16.28

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless2 model: Open In Colab

Alimeeting

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with far subset)

Eval Test-Net
greedy search 31.77 34.66
fast beam search 31.39 33.02
modified beam search 30.38 34.25

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

TAL_CSASR

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best results for Chinese CER(%) and English WER(%) respectivly (zh: Chinese, en: English):

decoding-method dev dev_zh dev_en test test_zh test_en
greedy_search 7.30 6.48 19.19 7.39 6.66 19.13
modified_beam_search 7.15 6.35 18.95 7.22 6.50 18.70
fast_beam_search 7.18 6.39 18.90 7.27 6.55 18.77

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: Open In Colab

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.

Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html#deployment-with-c for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: Open In Colab

icefall's People

Contributors

csukuangfj avatar danpovey avatar desh2608 avatar emreozkose avatar ezerhouni avatar glynpu avatar huangruizhe avatar jinzr avatar joespitch avatar kobenaxie avatar luomingshuang avatar marcoyang1998 avatar pehonnet avatar pingfengluo avatar pkufool avatar pzelasko avatar rickychanhoyin avatar rouseabout avatar shanguanma avatar shcxlee avatar teapoly avatar teowenshen avatar videodanchik avatar wangtiance avatar waynewiser avatar wgb14 avatar yaozengwei avatar yfyeung avatar yuekaizhang avatar zhuangweiji avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.