Giter Site home page Giter Site logo

by2101 / openasr Goto Github PK

View Code? Open in Web Editor NEW
111.0 2.0 23.0 2.27 MB

A pytorch based end2end speech recognition system.

License: Apache License 2.0

Shell 10.57% Python 77.88% Perl 11.55%
speech speech-recognition speech-to-text speechrecognition speech-recognizer transformer las end2end asr

openasr's Introduction

OpenASR

A pytorch based end2end speech recognition system. The main architecture is Speech-Transformer.

中文说明

Features

  1. Minimal Dependency. The system does not depend on external softwares for feature extraction or decoding. Users just install PyTorch deep learning framework.
  2. Good Performance. The system includes advanced algorithms, such as Label Smoothing, SpecAug, LST, and achieves good performance on ASHELL1. The baseline CER on AISHELL1 test is 6.6, which is better than ESPNet.
  3. Modular Design. We divided the system into several modules, such as trainer, metric, schedule, models. It is easy for extension and adding features.
  4. End2End. The feature extraction and tokenization are online. The system directly processes wave file. So, the procedure is much simpified.

Dependency

  • python >= 3.6
  • pytorch >= 1.1
  • pyyaml >= 5.1
  • tensorflow and tensorboardX for visualization. (if you do not need visualize the results, you can set TENSORBOARD_LOGGING to 0 in src/utils.py)

Usage

We use KALDI style example organization. The example directory include top-level shell scripts, data directory, exp directory. We provide an AISHELL-1 example. The path is ROOT/egs/aishell1/s5.

Data Preparation

The data preparation script is prep_data.sh. It will automaticlly download AISHELL-1 dataset, and format it into KALDI style data directory. Then, it will generate json files, and grapheme vocabulary. You can set corpusdir for storing dataset.

bash prep_data.sh

Then, it will generate data directory and exp directory.

Train Models

We use yaml files for parameter configuration. We provide 3 examples.

config_base.yaml  # baseline ASR system
config_lm_lstm.yaml  # LSTM language model
config_lst.yaml  # training ASR with LST

Run train.sh script for training baseline system.

bash train.sh

Model Averaging

Average checkpoints for improving performance.

bash avg.sh

Decoding and Scoring

Run decode_test.sh script for decoding test set.

bash decode_test.sh
bash score.sh data/test/text exp/exp1/decode_test_avg-last10

Visualization

We provide TensorboardX based visualization. The event files are stored in $expdir/log. You can use tensorboard to visualize the training procedure.

tensorboard --logdir=$expdir --bind_all

Then you can see procedures in browser (http://localhost:6006).

Examples:

per token loss in batch

encoder attention

encoder-decoder attention

Acknowledgement

This system is implemented with PyTorch. We use wave reading codes from SciPy. We use SCTK software for scoring. Thanks to Dan Povey's team and their KALDI software. I learn ASR concept, and example organization from KALDI. And thanks to Google Lingvo Team. I learn the modular design from Lingvo.

Bib

@article{bai2019learn, title={Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition}, author={Bai, Ye and Yi, Jiangyan and Tao, Jianhua and Tian, Zhengkun and Wen, Zhengqi}, year={2019} }

References

Dong, Linhao, Shuang Xu, and Bo Xu. "Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition." 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. Zhou, Shiyu, et al. "Syllable-based sequence-to-sequence speech recognition with the transformer in mandarin chinese." arXiv preprint arXiv:1804.10752 (2018).

openasr's People

Contributors

by2101 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

openasr's Issues

RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 1

Traceback (most recent call last):
File "/home/lenovo/E2E-model/OpenASR/OpenASR-master/egs/aishell2/s5/../../../src/train.py", line 134, in
trainer.train()
File "/home/lenovo/E2E-model/OpenASR/OpenASR-master/src/trainer.py", line 152, in train
tr_loss = self.iter_one_epoch()
File "/home/lenovo/E2E-model/OpenASR/OpenASR-master/src/trainer.py", line 208, in iter_one_epoch
data = next(loader_iter)
File "/home/lenovo/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/lenovo/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/lenovo/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/lenovo/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 8.
Original Traceback (most recent call last):
File "/home/lenovo/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/lenovo/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/lenovo/E2E-model/OpenASR/OpenASR-master/src/data.py", line 242, in call
padded_waveforms, wave_lengths = load_wave_batch(paths)
File "/home/lenovo/E2E-model/OpenASR/OpenASR-master/src/data.py", line 189, in load_wave_batch
padded_waveforms[i, :lengths[i]] += waveforms[i]
RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 1
您好,我在使用AISHELL-1数据未出先问题,使用AISHELL-2出现以上问题,搜索很多资料,问题没有得到解决,请问该怎样修改?谢谢!

the score seems incorrect

Screenshot from 2020-03-25 23-40-42
i suspect that the aishell.txt in exp is the problem.when i use decode_test.sh,the argument:exp/aishell.txt is the text i donnt have.so i argue it seems the vocab_file argument and then i use the aishel1_train_chars.txt instead and then it happens.
How can i do to solve it?
我发现在decode_test.sh的参数中我没有aishell.sh,于是我用了同在一个目录下的aishel1_train_chars.txt代替,因为我觉得这个参数对应这vocab_file这个参数,这样就造成了上述的情况,请问我错在什么地方?应该怎么修改?
(Ps: Thanks for giving the avg_last_ckpts.py!)

请问下您的复现成功了么?

我这边又跑了一边,还是decode的结果不对……无论是测试哪一段音频,decode结果都是同一句话,请问我应该如何修改呢?

训练遇到cufft错误

您好,我训练的时候遇到一个cufft的问题。具体的错误提示如下

Epoch 10 | Step 40686 | Iter 30400:
per_token_loss: 1.3417494 | avg_token_loss: 1.3317900 | learning_rate: 0.0002191
sequence_per_sec: 26.4520501
terminate called after throwing an instance of 'c10::Error'
  what():  cuFFT error: CUFFT_INVALID_PLAN (CUFFT_CHECK at /pytorch/aten/src/ATen/native/cuda/CuFFTUtils.h:70)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f585fed4536 in /root/.local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x299e1e0 (0x7f5862cce1e0 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x29a3c0d (0x7f5862cd3c0d in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #3: at::native::_fft_cufft(at::Tensor const&, long, bool, bool, bool, c10::ArrayRef<long>, bool, bool, c10::ArrayRef<long>) + 0x752 (0x7f5862cd1af2 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xfa2408 (0x7f58612d2408 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0xfe27a4 (0x7f58613127a4 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0x2c2221c (0x7f589227a21c in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x10c3f44 (0x7f589071bf44 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0xde1d5f (0x7f5890439d5f in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::native::rfft(at::Tensor const&, long, bool, bool) + 0x22 (0x7f589043b0f2 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x1152ef4 (0x7f58907aaef4 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x2cf1434 (0x7f5892349434 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x1189308 (0x7f58907e1308 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: <unknown function> + 0x298603 (0x7f589e8a0603 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #59: <unknown function> + 0x76ba (0x7f58a1dcb6ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #60: clone + 0x6d (0x7f58a1b0141d in /lib/x86_64-linux-gnu/libc.so.6)

我现在用的是python==3.7 torch==1.5+cu9.2是版本问题吗?
另外我把lm decode改好了,最近测了下发现性能有下降。感觉不太正常。

Error during train.sh

Hi, I am using torch==1.4.0 with cuda, when I am training model with bash train.sh I meet the following error:

Traceback (most recent call last):
35910   File "/root/OpenASR/egs/aishell1/s5/../../../src/train.py", line 131, in <module>
35911     trainer.train()
35912   File "/root/OpenASR/src/trainer.py", line 152, in train
35913     tr_loss = self.iter_one_epoch()
35914   File "/root/OpenASR/src/trainer.py", line 229, in iter_one_epoch
35915     lst_t=self.lst_t)
35916   File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
35917     result = self.forward(*input, **kwargs)
35918   File "/root/.local/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
35919     outputs = self.parallel_apply(replicas, inputs, kwargs)
35920   File "/root/.local/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
35921     return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
35922   File "/root/.local/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
35923     output.reraise()
35924   File "/root/.local/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
35925     raise self.exc_type(msg)
35926 TypeError: Caught TypeError in replica 2 on device 2.
35927 Original Traceback (most recent call last):
35928   File "/root/.local/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
35929     output = module(*input, **kwargs)
35930   File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
35931     result = self.forward(*input, **kwargs)
35932 TypeError: forward() missing 3 required positional arguments: 'batch_wave', 'lengths', and 'target_ids'

Do you know why ? If this is the mismatch between torch version ?

解码时如何使用LM

您好,我已经训练好了一个LM 和一个AM。请问如何在解码的时候使用这个LM我没在示例中发现。或者您的思路是什么能分享一下吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.