wenet-e2e / wenet Goto Github PK

Production First and Production Ready End-to-End Speech Recognition Toolkit

Home Page: https://wenet-e2e.github.io/wenet/

License: Apache License 2.0

Python 47.47% Shell 3.67% C++ 42.31% CMake 2.85% Perl 1.85% Java 0.40% CSS 0.21% JavaScript 0.16% HTML 0.24% Dockerfile 0.04% C 0.19% Swift 0.26% Objective-C 0.06% Objective-C++ 0.13% Ruby 0.01% Cuda 0.14% Makefile 0.03%

e2e-models pytorch asr transformer conformer production-ready automatic-speech-recognition speech-recognition whisper

wenet's Introduction

WeNet

We share Net together.

Highlights

Production first and production ready: The core design principle, WeNet provides full stack production solutions for speech recognition.
Accurate: WeNet achieves SOTA results on a lot of public speech datasets.
Light weight: WeNet is easy to install, easy to use, well designed, and well documented.

Install

Install python package

pip install git+https://github.com/wenet-e2e/wenet.git

Command-line usage (use -h for parameters):

wenet --language chinese audio.wav

Python programming usage:

import wenet

model = wenet.load_model('chinese')
result = model.transcribe('audio.wav')
print(result['text'])

Please refer python usage for more command line and python programming usage.

Install for training & deployment

Clone the repo

git clone https://github.com/wenet-e2e/wenet.git

Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
Create Conda env:

conda create -n wenet python=3.10
conda activate wenet
conda install conda-forge::sox
pip install -r requirements.txt
pre-commit install  # for clean and tidy code

# If you encounter sox compatibility issues
RuntimeError: set_buffer_size requires sox extension which is not available.
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel
# conda env
conda install  conda-forge::sox

Build for deployment

Optionally, if you want to use x86 runtime or language model(LM), you have to build the runtime as follows. Otherwise, you can just ignore this step.

# runtime build requires cmake 3.14 or above
cd runtime/libtorch
mkdir build && cd build && cmake -DGRAPH_TOOLS=ON .. && cmake --build .

Please see doc for building runtime on more platforms and OS.

Discussion & Communication

You can directly discuss on Github Issues.

For Chinese users, you can aslo scan the QR code on the left to follow our offical account of WeNet. We created a WeChat group for better discussion and quicker response. Please scan the personal QR code on the right, and the guy is responsible for inviting you to the chat group.

Acknowledge

We borrowed a lot of code from ESPnet for transformer based modeling.
We borrowed a lot of code from Kaldi for WFST based decoding for LM integration.
We referred EESEN for building TLG based graph for LM integration.
We referred to OpenTransformer for python batch inference of e2e models.

Citations

@inproceedings{yao2021wenet,
title={WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit},
author={Yao, Zhuoyuan and Wu, Di and Wang, Xiong and Zhang, Binbin and Yu, Fan and Yang, Chao and Peng, Zhendong and Chen, Xiaoyu and Xie, Lei and Lei, Xin},
  booktitle={Proc. Interspeech},
  year={2021},
  address={Brno, Czech Republic },
  organization={IEEE}
}

@article{zhang2022wenet,
  title={WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit},
  author={Zhang, Binbin and Wu, Di and Peng, Zhendong and Song, Xingchen and Yao, Zhuoyuan and Lv, Hang and Xie, Lei and Yang, Chao and Pan, Fuping and Niu, Jianwei},
  journal={arXiv preprint arXiv:2203.15455},
  year={2022}
}

wenet's People

Contributors

Stargazers

Watchers

Forkers

entn-at placebokkk gqwert123 xqq2018rebuild liyinchao anshuiyin xiexukang gandolfxu lvhang glynpu l2009312042 rxhmdia shiyang1983 t13m hitxujian 121898 zycv qoboty fchest xbsdsongnan honghe avatarworld yh646492956 studyself tory0820 yeshunping swlim5427 zhengyu111 fanlu jerrywei1985 jjoving adolfvonkleist saber5433 lyjzsyzlt qmpzzpmq jingyonghou lvchigo faranaziz zw76859420 macroustc makinglong maxmax2016 ouc-lan wangfn robinatp bhaskarbharat marsgwh mudmoh preventions zhangsanfeng86 fireae zyz0577 ai-alive yueliangniao liroda sudeep4893 srzarin karatemir zhaoyun630 mikelei askmetoo 0xf4vul jyp0716 vsvinoth voxlogic haiewu hbsoftfengzhixing sundy1219 786440445 lijianhackthon ernie-mlg forestlee han-xie donstang googly0 chwma0 liangtianxin y00281951 bpshu arkadyark xiaguangmin nitin4525 appalachianwine llmhao sciai-ai kli017 zh794390558 hualuluu whitefu xyh523078979 day9011 tanghaitao-ape zoumt1633 hajime9652 wxy1988 hnn123 mapleleafss shiyuzh2007 hannes1 whispercosat

wenet's Issues

make error during server build

cmake version 3.19.4

gcc version 9.3.0 (GCC)

cmake successfully,but make error when runtime server building

wenet-main/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to lgammaf@GLIBC_2.23' wenet-main/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to lgamma@GLIBC_2.23'

please look at it, thanks!

how to use K40c GPU train

K40c is not support torch1.6.0
I have install torch1.3.0, but the following error occurred

site-packages/torch/utils/data/_utils/signal_handling.py", line 737, in _try_get_data raise RuntimeError
RuntimeError: DataLoader worker (pid 133734） is killed by signal: Aborted

Error in `python': corrupted size vs. prev_size: 0x0000560a10045d70

When I used my own data to train the Conformer network, the following error occurred at the beginning of the training. How should I solve it? The batchsize has been set very small, 4, and my GPU is a 32G graphics card（NVIDIA V100）.

...
2021-03-15 15:34:05,287 INFO Checkpoint: save to checkpoint exp/conformer/init.pt
2021-03-15 15:34:06,656 INFO Epoch 0 TRAIN info lr 4e-08
2021-03-15 15:34:06,657 INFO using accumulate grad, new batch size is 1 timeslarger than before
2021-03-15 15:34:12,573 DEBUG TRAIN Batch 0/19074 loss 525.971741 loss_att 291.681671 loss_ctc 1072.648438 lr 0.00000004 rank 0
2021-03-15 15:34:36,776 DEBUG TRAIN Batch 100/19074 loss 610.763184 loss_att 431.314453 loss_ctc 1029.476929 lr 0.00000404 rank 0
2021-03-15 15:35:00,639 DEBUG TRAIN Batch 200/19074 loss 29.456654 loss_att 28.791754 loss_ctc 31.008080 lr 0.00000804 rank 0
*** Error in `python': corrupted size vs. prev_size: 0x0000560a10045d70 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777f5)[0x7f9ca30ea7f5]
/lib/x86_64-linux-gnu/libc.so.6(+0x80e0b)[0x7f9ca30f3e0b]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f9ca30f758c]
/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torchaudio/_torchaudio.so(_ZN5torch5audio18build_flow_effectsERKSsN2at6TensorEbP16sox_signalinfo_tP18sox_encodinginfo_tPKcSt6vectorINS0_9SoxEffectESaISC_EEi+0xfec)[0x7f9c3f46317c]
/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torchaudio/_torchaudio.so(+0x86dc3)[0x7f9c3f481dc3]
/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torchaudio/_torchaudio.so(+0x7bbfa)[0x7f9c3f476bfa]
python(PyCFunction_Call+0x58)[0x560a027b62d8]
python(_PyObject_MakeTpCall+0x23c)[0x560a027a5edc]
python(_PyEval_EvalFrameDefault+0x45a9)[0x560a02831879]
python(_PyEval_EvalCodeWithName+0x300)[0x560a027fb760]
python(_PyFunction_Vectorcall+0x1e3)[0x560a027fc593]
python(+0x10399c)[0x560a0276599c]
python(_PyFunction_Vectorcall+0x10b)[0x560a027fc4bb]
python(+0x10425f)[0x560a0276625f]
python(_PyEval_EvalCodeWithName+0x8b1)[0x560a027fbd11]
python(_PyFunction_Vectorcall+0x1e3)[0x560a027fc593]
python(+0x10425f)[0x560a0276625f]
python(PyEval_EvalCodeWithName+0x8b1)[0x560a027fbd11]
...
...
...
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Traceback (most recent call last):
File "wenet/bin/train.py", line 211, in
executor.train(model, optimizer, scheduler, train_data_loader, device,
File "/home/sine/wenet/wenet-main/examples/accent_reg/s0/wenet/utils/executor.py", line 63, in train
optimizer.zero_grad()
File "/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/optim/optimizer.py", line 171, in zero_grad
p.grad.detach()
File "/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 30256) is killed by signal: Aborted.

What are the possible reasons for this result？

pre-trained model : http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/aishell/20210204_unified_transformer_exp.tar.gz
test data: ai-shell1

x86 RuntimeError

非常感谢wenet的开源工作(๑•̀ㅂ•́)و✧！
以下是完整的报错：

这里跑x86用的模型是自己用8k音频训练的模型，然后把c++源码里的16k全部改成了8k，编译完成测试的时候报了上面的错，哪位知道解决办法还望赐教。

Stage 3 shows no such file or directory

I followed the instruct line by line, and when bash run.sh --stage 3 --stop-stage 3, it showed no such file or directory, what can I do? Thanks.

Describe the bug
tools/format_data.sh --nj 32 --feat-type wav --feat raw_wav/dev/wav.scp raw_wav/dev data/dict/lang_char.txt
split: illegal option -- -
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
ls: raw_wav/dev/log/wav_.slice: No such file or directory
cat: raw_wav/dev/log/wav_.shape: No such file or directory
tools/format_data.sh --nj 32 --feat-type wav --feat raw_wav/test/wav.scp raw_wav/test data/dict/lang_char.txt
split: illegal option -- -
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
ls: raw_wav/test/log/wav_.slice: No such file or directory
cat: raw_wav/test/log/wav_.shape: No such file or directory
tools/format_data.sh --nj 32 --feat-type wav --feat raw_wav/train/wav.scp raw_wav/train data/dict/lang_char.txt
split: illegal option -- -
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
ls: raw_wav/train/log/wav_.slice: No such file or directory
cat: raw_wav/train/log/wav_.shape: No such file or directory

Desktop (please complete the following information):

OS: Mac os

Move shared cpp code out of runtime/server/x86/ to runtime/core/

in attention_rescoring，why the embedding vector of the input token of decoder multiply self.xscale ?

x = x * self.xscale in this place self.xscale = 16 ? is there any paper explain this ?thanks

Update results on Librispeech data set.

@whiteshirt0429

Indiscriminate sed replacement triggers `use_static_chunk` issue

The following sed command in the librispeech s0 recipe switches all instances of dynamic to static, which if a train_unified_conformer.yml recipe is used causes the use_dynamic_chunk: true to be replaced with use_static_chunk: true at eval time.

TypeError: init() got an unexpected keyword argument 'use_static_chunk'

https://github.com/mobvoi/wenet/blob/a1250553371097826da13fb1a9438a9e8a9dc110/examples/librispeech/s0/run.sh#L183

this is easily fixed by modifying the above run.sh line. But I wonder if there is something else going on there that I missed that could still affect the accuracy of the model at training/decode time. It seems to work OK, but I don't find this in the aishell run.sh for either recipe.

make failure

I create conda env accoding to following:

conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
conda install pytorch==1.6.0 cudatoolkit=10.1 torchaudio -c pytorch

cmake is OK, but error occure when make, the error is:

/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::replace(unsigned long, unsigned long, char const*, unsigned long)@GLIBCXX_3.4.21'
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::_Sp_locker::_Sp_locker(void const*)@GLIBCXX_3.4.21'
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(unsigned long, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const@GLIBCXX_3.4.21'
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::basic_ofstream<char, std::char_traits<char> >::basic_ofstream(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::_Ios_Openmode)@GLIBCXX_3.4.21'
.....

My os is centos 7.5 and gcc 7.3.1, cmake 3.19.4.

Do they (two learning rate schedule NoamLR and WarmupLR) have much difference during training?

Have you test the difference for this two learning rate schedule (NoamLR and WarmupLR)? When I use NoamLR to train CTC/AED joint model, it seems that it's quite hard to train it.

Remove Kaldi dependency.

Use TorchAudio for feature extraction

Are the pre-trained models available somewhere?

Many thanks!

torch can't be loaded in Mac Pro

OSError: dlopen(/opt/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/lib/libtorch_global_deps.dylib, 10): Library not loaded: @rpath/libmkl_intel_lp64.dylib
Referenced from: /opt/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/lib/libtorch_global_deps.dylib
Reason: image not found

An error occurred while running on Android

I downloaded final.zip and words.txt, put them into Assets folder, and compiled the project into APK to run on Android system. When I open the app and click the button, it will report an error and flash back.

The error message is as followed :

26715-27453/com.mobvoi.wenet E/libc++abi: terminating with uncaught exception of type c10::Error: Expected at most 5 argument(s) for operator 'forward_encoder_chunk', but received 7 argument(s). Declaration: forward_encoder_chunk(torch.wenet.transformer.asr_model.___torch_mangle_21.ASRModel self, Tensor xs, Tensor? subsampling_cache=None, Tensor[]? elayers_output_cache=None, Tensor[]? conformer_cnn_cache=None) -> ((Tensor, Tensor, Tensor[], Tensor[]))
Exception raised from checkAndNormalizeInputs at ../aten/src/ATen/core/function_schema_inl.h:245 (most recent call first):
(no backtrace available)

26715-27453/com.mobvoi.wenet A/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 27453 (om.mobvoi.wenet), pid 26715 (om.mobvoi.wenet)

There was no wenet/bin/recognize.py file in main branch now.

There was no wenet/bin/recognize.py file in main branch now. If i run any script to decode it will report as bellow:

python: can't open file 'wenet/bin/recognize.py': [Errno 2] No such file or directory

Issue about encoder-decoder attention

The encoder outputs are fed into decoder entirely, so the encoder-decoder attention attends to the whole sequence. Right? Why not use monotonic attention?

tools/format_data.sh bug

tools/format_data.sh文件89行
"${trans_type}" == "ch_char_en_bpe" 是否应该写作 "${trans_type}" == "cn_char_en_bpe"

Complementary Language Models

Do you support complementary language models, do you plan to? I didn’t notice any examples or related code in the repo.

How long can we decode wav file?

Describe the bug
I got the error about input file size(600 sec) with offline demo on server runtime.
But I have no error with 60 sec.
With streaming demo, I used the same wav file(600sec) and the server hung up.

To Reproduce
Steps to reproduce the behavior:
Go to...

cd wenet/runtime/server/x86

Run this command...

export GLOG_logtostderr=1
export GLOG_v=2
#wav_scp=raw_wav/test.scp
wav_path=
model_dir=

./build/decoder_main \
    --chunk_size -1 \
    --wav_path $wav_path \
    --model_path $model_dir/final.zip \
    --dict_path $model_dir/words.txt 2>&1 | tee log.txt

Get this error.(some logs added by me.)

$ bash offline_recog.sh 
I0322 07:28:19.092447  5399 torch_asr_model.cc:36] torch model info subsampling_rate 4 right context 6 sos 11175 eos 11175
I0322 07:28:19.111845  5399 feature_pipeline.h:43] feature pipeline config num_bins 80 frame_length 400frame_shift160
I0322 07:28:19.112640  5399 decoder_main.cc:74] wav raw_wav/wani.wav
I0322 07:28:19.113868  5399 wav.h:73] wav header info: data size 36
I0322 07:28:19.114097  5399 decoder_main.cc:79] read 18 samples, 1 channels, 16 bits, So we got the length of data is 18
I0322 07:28:19.114109  5399 fbank.h:133] Get the 18 samples
I0322 07:28:19.114113  5399 feature_pipeline.cc:39] add 0 frames
I0322 07:28:19.114121  5399 decoder_main.cc:83] num frames 0
I0322 07:28:19.114135  5399 torch_asr_decoder.cc:60] AdvanceDecoding
I0322 07:28:19.114140  5399 torch_asr_decoder.cc:78] Required 2147483647 get 0
terminate called after throwing an instance of 'c10::Error'
  what():  There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, QuantizedCPU, Autograd, Profiler, Tracer, Autocast]
Exception raised from reportError at ../aten/src/ATen/core/dispatch/Dispatcher.cpp:306 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x68 (0x7f9ffa3eaeb8 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libc10.so)
frame #1: c10::Dispatcher::reportError(c10::DispatchTable const&, c10::DispatchKey) + 0x18f (0x7f9ffb12780f in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #2: at::_cat(c10::ArrayRef<at::Tensor>, long) + 0x203 (0x7f9ffb8bf373 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #3: at::native::cat(c10::ArrayRef<at::Tensor>, long) + 0xbd (0x7f9ffb53f4ad in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x135fec6 (0x7f9ffb977ec6 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0xac4c3c (0x7f9ffb0dcc3c in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #6: at::cat(c10::ArrayRef<at::Tensor>, long) + 0x117 (0x7f9ffb8bf067 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x2ef7d5d (0x7f9ffd50fd5d in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0xac4c3c (0x7f9ffb0dcc3c in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #9: at::cat(c10::ArrayRef<at::Tensor>, long) + 0x117 (0x7f9ffb8bf067 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x40f79 (0x5650037f1f79 in ./build/decoder_main)
frame #11: <unknown function> + 0x3f57d (0x5650037f057d in ./build/decoder_main)
frame #12: <unknown function> + 0xe979 (0x5650037bf979 in ./build/decoder_main)
frame #13: __libc_start_main + 0xe6 (0x7f9ff926dbf6 in /lib/x86_64-linux-gnu/libc.so.6)
frame #14: <unknown function> + 0xde59 (0x5650037bee59 in ./build/decoder_main)

Grateful if someone can share a valid docker file.

Transformer models

Is your feature request related to a problem? Please describe.
Low ressource languages and deep domain use cases need more efficient models

Describe the solution you'd like
Huggingface is working in their transformers library integrating ASR models like wave2vec 2 and speech transformer

Describe alternatives you've considered
The fairseq implementation of wave2vec has more dependencies and is more complex to use and less readable.

Additional context
Integrating huggingface models makes have you pretrained models from modelhub, multiple models support with less code.

wrong cmvn_opt path in multi_cn run.sh

When I run stage 4 of multi_cn run.sh
It gives an error: raw_wav/train/global_cmvn': No such file or directory

In the run.sh line 225, I think the path should be $cmvn && cp ${feat_dir}}_${en_modeling_unit}/${train_set}/global_cmvn $dir ?

android demo识别率很低

早上好你叫什么名字去机场要怎么走
识别成：
宝上方你这什欢名次却具残要车
准确率大概25%，是否因为lm没加的缘故

Add Android demo app.

How many milliseconds of delay does different chunk correspond to?

I notice chunk 1 degradation 30% (from 5.51 to 7.83) in section "Unified Dynamic chunk" ctc greedy search.

Please let me know if my understanding is incorrect:
Chunk 1 delay conversion is 12layers*1chunk*40ms(conv2d) = 480 ms.

Forward chunk by chunk in decoding mode

I can't understand forward chunk by chunk function in encoder.py

So, I have some problem about process of calculating right context parameter and
how to use it in forward chunk by chunk problem

How to calculate context parameter like self.right context included in Conv2dSubsampling4 class in subsampling.py ?
What is the frame_rate_of_this_layer in Conv2dSubsampling4 ?
Why feed forward overlap input step by step in forward_chunk_by_chunk ?
What is the stride and decoding_window and how to calculate them in forward_chunk_by_chunk?

Can anyone solve my problem ?

Can I use only single GPU (RTX 2080 super) ?

Can I train the model using single gpu ?

Two differences between wenet and espnet

Thank you for your nice work!
I found two differences between wenet and espnet:

Layernorm/Batchnorm in conv module
Positional Encoding reverse True/False
Are these modifications necessary？

Is can have model on google drive?

Thanks you

无法使用多卡运行

我clone了wenet的最新版本，然后想使用双卡来跑aishell1的demo，但是运行后报如下错误：
单卡是没有问题的，似乎是缺少一个flock的函数？

Traceback (most recent call last):
Traceback (most recent call last):
File "wenet/bin/train.py", line 185, in
File "wenet/bin/train.py", line 185, in
model = torch.nn.parallel.DistributedDataParallel(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 331, in init
model = torch.nn.parallel.DistributedDataParallel(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 331, in init
self._distributed_broadcast_coalesced(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 549, in _distributed_broadcast_coalesced
self._distributed_broadcast_coalesced(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 549, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
RuntimeError: flock: Function not implemented
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
RuntimeError: flock: Function not implemented
terminate called after throwing an instance of 'std::system_error'
what(): flock: Function not implemented
terminate called after throwing an instance of 'std::system_error'
what(): flock: Function not implemented

How to prepare english data?

Hi, Wenet is so amazing!
But I doubt if this model supports English cause we have to tokenize the sentence with " ".
Could I use "<SPACE!>" to represent " " in sentence like "H e l l o <SPACE!> w o r l d" or sth like that ?
Looking forward to your reply.

Issue about multiprocessing decoding

I tried to use multiprocessing decoding (run.pl like espnet), but it's extremely slow. What's the possible reason?

Compile Android Project on PC: UnsatisfiedLinkError: couldn't find "libwenet.so"

Hi, here is the problem, when I tried to compile the WENET latest android codes you offerred in my machine, and add the final.zip, words.txt file into the directory asked. However, it just crashed ( both the emulator in android studio and my perssonal cellphone) , always after I gave the permission for the privacy of recording. I did some online search try to find out [1] still, sadly all not work. Here are the problems which send back to me.

And my environment is: windows 10, sdk 6.0 - 9.0, Cmake 3.18.1.

E/AndroidRuntime: FATAL EXCEPTION: main
    Process: com.mobvoi.wenet, PID: 11753
    java.lang.UnsatisfiedLinkError: dalvik.system.PathClassLoader[DexPathList[[zip file "/data/app/com.mobvoi.wenet-VaKqTfUA4p7TCtqlZS2Gtg==/base.apk"],nativeLibraryDirectories=[/data/app/com.mobvoi.wenet-VaKqTfUA4p7TCtqlZS2Gtg==/lib/arm, /data/app/com.mobvoi.wenet-VaKqTfUA4p7TCtqlZS2Gtg==/base.apk!/lib/armeabi-v7a, /system/lib]]] couldn't find "libwenet.so"
        at java.lang.Runtime.loadLibrary0(Runtime.java:1012)
        at java.lang.System.loadLibrary(System.java:1669)
        at com.mobvoi.wenet.Recognize.<clinit>(Recognize.java:6)
        at com.mobvoi.wenet.Recognize.init(Native Method)
        at com.mobvoi.wenet.MainActivity.onCreate(MainActivity.java:88)
        at android.app.Activity.performCreate(Activity.java:7136)
        at android.app.Activity.performCreate(Activity.java:7127)
        at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1271)
        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2893)
        at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3048)
        at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:78)
        at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:108)
        at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:68)
        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1808)
        at android.os.Handler.dispatchMessage(Handler.java:106)
        at android.os.Looper.loop(Looper.java:193)
        at android.app.ActivityThread.main(ActivityThread.java:6669)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)

Can we output time duration for each word？

Symlinks in Android examples cpp CMAKE

Maybe you could change CMakeLists.txt to copy from the "core" directory instead for Windows users?

CMake error during server build

HEAD is now at 96a2f23 Merge pull request #419 from shinh/release-0-4-0
[ 33%] No patch step for 'glog-populate'
[ 44%] Performing update step for 'glog-populate'
[ 55%] No configure step for 'glog-populate'
[ 66%] No build step for 'glog-populate'
[ 77%] No install step for 'glog-populate'
[ 88%] No test step for 'glog-populate'
[100%] Completed 'glog-populate'
[100%] Built target glog-populate
CMake Error at /usr/lib/x86_64-linux-gnu/cmake/gflags/gflags-targets.cmake:37 (message):
Some (but not all) targets in this export set were already defined.

你们会放出multi-cn的conformer的pretrain model吗

感谢分享这么棒的工作
想问一下你们会分享multi-cn的conformer的pretrain model吗

need a parallel inference program

it would be great if wenet have a parallel inference program to speed up the decoding procedure

step4 run error ， the error log as follow，which files should be modify to correct it ? thanks

root@e62b3865c7cc:~/data/project/wenet/examples/aishell/s0# ./run.sh
./run.sh: init method is file:///root/data/project/wenet/examples/aishell/s0/exp/sp_spec_aug/ddp_init
wenet/bin/train.py:76: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(fin)
Traceback (most recent call last):
File "wenet/bin/train.py", line 82, in
**configs['spec_aug_conf'],
KeyError: 'spec_aug_conf'
wenet/bin/train.py:76: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(fin)
Traceback (most recent call last):
File "wenet/bin/train.py", line 82, in
**configs['spec_aug_conf'],
KeyError: 'spec_aug_conf'
do model average and final checkpoint is exp/sp_spec_aug/avg_10.pt
Namespace(dst_model='exp/sp_spec_aug/avg_10.pt', max_epoch=65536, min_epoch=0, num=10, src_path='exp/sp_spec_aug', val_best=True)
Traceback (most recent call last):
File "wenet/bin/average_model.py", line 47, in
sort_idx = np.argsort(val_scores[:, -1])
IndexError: too many indices for array
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
./run.sh: line 165: python2: command not found
./run.sh: line 165: python2: command not found
./run.sh: line 165: python2: command not found
./run.sh: line 165: python2: command not found
Traceback (most recent call last):
File "wenet/bin/export_jit.py", line 29, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'

wav file parsing error when using runtime/server/x86/build/decoder_main

when I pass some wav files to decoder_main I may encounter exceptions as follows:

I0317 14:59:13.375226  1310 torch_asr_model.cc:36] torch model info subsampling_rate 4 right context 6 sos 4232 eos 4232
I0317 14:59:13.379293  1310 decoder_main.cc:80] num frames 0
I0317 14:59:13.379323  1310 torch_asr_decoder.cc:77] Required 2147483647 get 0
terminate called after throwing an instance of 'c10::Error'
  what():  There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, QuantizedCPU, Autograd, Profiler, Tracer, Autocast]

I guess it's a error caused by frontend/wav.h, in wav.h

# wav.h 
...
# line 31
struct WavHeader {
  char riff[4];  // "riff"
  unsigned int size;
  char wav[4];  // "WAVE"
  char fmt[4];  // "fmt "
  unsigned int fmt_size;
  uint16_t format;
  uint16_t channels;
  unsigned int sample_rate;
  unsigned int bytes_per_second;
  uint16_t block_size;
  uint16_t bit;
  char data[4];  // "data"
  unsigned int data_size;
};
...
# line 56
fread(&header, 1, sizeof(header), fp);
...
# line 72
    int num_data = header.data_size / (bits_per_sample_ / 8);
    data_ = new float[num_data];
    num_sample_ = num_data / num_channel_;

There is a struct containing RIFF-FORMAT-DATA chunk, while sometimes, a fine wav file may contain some other chunks in wav header, like fact chunk and list chunk , when we process audio files with ffmpeg or pydub which is based on ffmpeg, there's a high possibility a LIST CHUNK encoded into generated wav file, you can talk this link as a reference.

A better way I guess is to read 4 bytes detecting which chunk the following part is, and then process it, after the data chunk is detected, then we can continue with next steps.

I'm not familiar with C/C++ coding, and not sure if my analysis is correct, but if it is, hope you can fix it or add a notification in README, it will be of great help, thanks :-)

Run code for training got some errors

Thank u for making your code public. Is it all ready for runing now? There are many core dumed when trainning by DDP.If training by single gpu, torchscript gets some errors,too.Just like Unknown type name 'torch.device'. If I ignore torch.jit.script, also got errors.My pytorch version is 1.7.0.

Alphabet based training

Is there any possibility to train model based on alphabet, not token?

when i run bash run.sh --stage 4 --stop-stage 4 ,got error:

Traceback (most recent call last):
File "wenet/bin/train.py", line 209, in
executor.train(model, optimizer, scheduler, train_data_loader, device,
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/utils/executor.py", line 35, in train
loss, loss_att, loss_ctc = model(feats, feats_lengths, target,
File "/home/dayu/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/transformer/asr_model.py", line 89, in forward
encoder_out, encoder_mask = self.encoder(speech, speech_lengths)
File "/home/dayu/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/transformer/encoder.py", line 133, in forward
masks = ~make_pad_mask(xs_lens).unsqueeze(1) # (B, 1, L)
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/utils/mask.py", line 140, in make_pad_mask
max_len = int(lengths.max().item())
RuntimeError: CUDA error: no kernel image is available for execution on the device

my evn is ubuntu20.04,cuda11.1 torch17.1

runtime error

Thank you for this great work.
I trained aishell follow aishell/s0, and get final.zip
I want try the x86 runtime, but get error:
cmd is :
./build/decoder_main --chunk_size -1 --wav_path /root/A2_0.wav --model_path ./final.zip --dict_path ./words.txt

the error is:
terminate called after throwing an instance of 'std::runtime_error'
what(): The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/torch/nn/functional.py", line 38, in forward_encoder_chunk
ret = ret2
else:
output = torch.matmul(input, torch.t(weight))
~~~~~~~~~~~~ <--- HERE
if torch.isnot(bias, None):
bias1 = unchecked_cast(Tensor, bias)

Traceback of TorchScript, original code (most recent call last):
File "/data/Softwares/miniconda3/envs/wenet/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in forward_encoder_chunk
ret = torch.addmm(bias, input, weight.t())
else:
output = input.matmul(weight.t())
~~~~~~~~~~~~ <--- HERE
if bias is not None:
output += bias
RuntimeError: size mismatch, m1: [244 x 4864], m2: [5120 x 256] at ../aten/src/TH/generic/THTensorMath.cpp:41

Did I make some mistakes? Or I need change configuration?
Thank you

How run inference on a single wav file?

No say here how to do:
https://github.com/mobvoi/wenet/blob/main/examples/aishell/s0/README.md

ESPNet的方法有的同样问题

使用mask[:, :, :-2:2][:, :, :-2:2]得到的降采样mask和pytorch的卷积公式得到的长度不一致。具体例子如下：
lens = torch.LongTensor([[24], [40], [60], [100]]).cpu()
print(compute_conv_length(compute_conv_length(lens, kernel_size=3, stride=2), kernel_size=3, stride=2))
得到结果：tensor([[ 5],
[ 9],
[14],
[24]])
mask = make_mask_by_length(a, lens).unsqueeze(-2)
new_mask = mask[:, :, :-2:2][:, :, :-2:2]
将布尔值转为int型求和输出
print(torch.sum(new_mask.int(), -1))
tensor([[ 6],
[10],
[15],
[24]])

Some errors occured in training

I followed the Tutorial of aishell, errors occured in bash run.sh --stage 4 --stop-stage 5 step

run.sh: init method is file:///home/dapeng/PycharmProjects/wenet/examples/aishell/s0/exp/sp_spec_aug/ddp_init
  File "wenet/bin/train.py", line 81
    collate_func = CollateFunc(**configs['collate_conf'],
                                                        ^
SyntaxError: invalid syntax
do model average and final checkpoint is exp/sp_spec_aug/avg_10.pt
Traceback (most recent call last):
  File "wenet/bin/average_model.py", line 7, in <module>
    import yaml
ImportError: No module named yaml
  File "wenet/bin/recognize.py", line 87
    test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
                                                       ^
SyntaxError: invalid syntax
  File "wenet/bin/recognize.py", line 87
    test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
                                                       ^
SyntaxError: invalid syntax
  File "  File "wenet/bin/recognize.pywenet/bin/recognize.py", line ", line 8787

        test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
                                                                                                              ^
^
SyntaxErrorSyntaxError: : invalid syntaxinvalid syntax

Traceback (most recent call last):
  File "tools/compute-wer.py", line 365, in <module>
    with codecs.open(hyp_file, 'r', 'utf-8') as fh:
  File "/usr/lib/python2.7/codecs.py", line 898, in open
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_ctc_prefix_beam_search/text'
Traceback (most recent call last):
  File "tools/compute-wer.py", line 365, in <module>
    with codecs.open(hyp_file, 'r', 'utf-8') as fh:
  File "/usr/lib/python2.7/codecs.py", line 898, in open
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_attention_rescoring/text'
Traceback (most recent call last):
  File "tools/compute-wer.py", line 365, in <module>
Traceback (most recent call last):
    with codecs.open(hyp_file, 'r', 'utf-8') as fh:
  File "tools/compute-wer.py", line 365, in <module>
  File "/usr/lib/python2.7/codecs.py", line 898, in open
    with codecs.open(hyp_file, 'r', 'utf-8') as fh:
  File "/usr/lib/python2.7/codecs.py", line 898, in open
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_ctc_greedy_search/text'
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_attention/text'

windows runtime

hi ，i complied the runtime/x86 code on windows platform，using vs2017 ，but when run the decode_main demo , it gets empty recognize result , why ?