Giter Site home page Giter Site logo

wenet-e2e / wenet Goto Github PK

View Code? Open in Web Editor NEW
3.7K 88.0 1.0K 23.26 MB

Production First and Production Ready End-to-End Speech Recognition Toolkit

Home Page: https://wenet-e2e.github.io/wenet/

License: Apache License 2.0

Python 47.47% Shell 3.67% C++ 42.31% CMake 2.85% Perl 1.85% Java 0.40% CSS 0.21% JavaScript 0.16% HTML 0.24% Dockerfile 0.04% C 0.19% Swift 0.26% Objective-C 0.06% Objective-C++ 0.13% Ruby 0.01% Cuda 0.14% Makefile 0.03%
e2e-models pytorch asr transformer conformer production-ready automatic-speech-recognition speech-recognition whisper

wenet's Introduction

WeNet

License Python-Version

Roadmap | Docs | Papers | Runtime | Pretrained Models | HuggingFace

We share Net together.

Highlights

  • Production first and production ready: The core design principle, WeNet provides full stack production solutions for speech recognition.
  • Accurate: WeNet achieves SOTA results on a lot of public speech datasets.
  • Light weight: WeNet is easy to install, easy to use, well designed, and well documented.

Install

Install python package

pip install git+https://github.com/wenet-e2e/wenet.git

Command-line usage (use -h for parameters):

wenet --language chinese audio.wav

Python programming usage:

import wenet

model = wenet.load_model('chinese')
result = model.transcribe('audio.wav')
print(result['text'])

Please refer python usage for more command line and python programming usage.

Install for training & deployment

  • Clone the repo
git clone https://github.com/wenet-e2e/wenet.git
conda create -n wenet python=3.10
conda activate wenet
conda install conda-forge::sox
pip install -r requirements.txt
pre-commit install  # for clean and tidy code

# If you encounter sox compatibility issues
RuntimeError: set_buffer_size requires sox extension which is not available.
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel
# conda env
conda install  conda-forge::sox

Build for deployment

Optionally, if you want to use x86 runtime or language model(LM), you have to build the runtime as follows. Otherwise, you can just ignore this step.

# runtime build requires cmake 3.14 or above
cd runtime/libtorch
mkdir build && cd build && cmake -DGRAPH_TOOLS=ON .. && cmake --build .

Please see doc for building runtime on more platforms and OS.

Discussion & Communication

You can directly discuss on Github Issues.

For Chinese users, you can aslo scan the QR code on the left to follow our offical account of WeNet. We created a WeChat group for better discussion and quicker response. Please scan the personal QR code on the right, and the guy is responsible for inviting you to the chat group.

Acknowledge

  1. We borrowed a lot of code from ESPnet for transformer based modeling.
  2. We borrowed a lot of code from Kaldi for WFST based decoding for LM integration.
  3. We referred EESEN for building TLG based graph for LM integration.
  4. We referred to OpenTransformer for python batch inference of e2e models.

Citations

@inproceedings{yao2021wenet,
title={WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit},
author={Yao, Zhuoyuan and Wu, Di and Wang, Xiong and Zhang, Binbin and Yu, Fan and Yang, Chao and Peng, Zhendong and Chen, Xiaoyu and Xie, Lei and Lei, Xin},
  booktitle={Proc. Interspeech},
  year={2021},
  address={Brno, Czech Republic },
  organization={IEEE}
}

@article{zhang2022wenet,
  title={WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit},
  author={Zhang, Binbin and Wu, Di and Peng, Zhendong and Song, Xingchen and Yao, Zhuoyuan and Lv, Hang and Xie, Lei and Yang, Chao and Pan, Fuping and Niu, Jianwei},
  journal={arXiv preprint arXiv:2203.15455},
  year={2022}
}

wenet's People

Contributors

aluminumbox avatar cdliang11 avatar chwma0 avatar cnrpman avatar day9011 avatar double22a avatar emiyassstar avatar fanlu avatar hicliff avatar jschenxiaoyu avatar lizhichaounicorn avatar luchuanze avatar mackong avatar mddct avatar mikelei avatar pengzhendong avatar placebokkk avatar qmpzzpmq avatar robin1001 avatar slyne avatar teapoly avatar wenjingxia avatar whiteshirt0429 avatar xingchensong avatar ygyuan avatar yuekaizhang avatar yygle avatar zailiwang avatar zhyyao avatar zwglory avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wenet's Issues

make error during server build

cmake version 3.19.4

gcc version 9.3.0 (GCC)

cmake successfully,but make error when runtime server building

wenet-main/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to lgammaf@GLIBC_2.23' wenet-main/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to lgamma@GLIBC_2.23'

please look at it, thanks!

how to use K40c GPU train

K40c is not support torch1.6.0
I have install torch1.3.0, but the following error occurred

site-packages/torch/utils/data/_utils/signal_handling.py", line 737, in _try_get_data raise RuntimeError
RuntimeError: DataLoader worker (pid 133734) is killed by signal: Aborted

Error in `python': corrupted size vs. prev_size: 0x0000560a10045d70

When I used my own data to train the Conformer network, the following error occurred at the beginning of the training. How should I solve it? The batchsize has been set very small, 4, and my GPU is a 32G graphics card(NVIDIA V100).

...
2021-03-15 15:34:05,287 INFO Checkpoint: save to checkpoint exp/conformer/init.pt
2021-03-15 15:34:06,656 INFO Epoch 0 TRAIN info lr 4e-08
2021-03-15 15:34:06,657 INFO using accumulate grad, new batch size is 1 timeslarger than before
2021-03-15 15:34:12,573 DEBUG TRAIN Batch 0/19074 loss 525.971741 loss_att 291.681671 loss_ctc 1072.648438 lr 0.00000004 rank 0
2021-03-15 15:34:36,776 DEBUG TRAIN Batch 100/19074 loss 610.763184 loss_att 431.314453 loss_ctc 1029.476929 lr 0.00000404 rank 0
2021-03-15 15:35:00,639 DEBUG TRAIN Batch 200/19074 loss 29.456654 loss_att 28.791754 loss_ctc 31.008080 lr 0.00000804 rank 0
*** Error in `python': corrupted size vs. prev_size: 0x0000560a10045d70 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777f5)[0x7f9ca30ea7f5]
/lib/x86_64-linux-gnu/libc.so.6(+0x80e0b)[0x7f9ca30f3e0b]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f9ca30f758c]
/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torchaudio/_torchaudio.so(_ZN5torch5audio18build_flow_effectsERKSsN2at6TensorEbP16sox_signalinfo_tP18sox_encodinginfo_tPKcSt6vectorINS0_9SoxEffectESaISC_EEi+0xfec)[0x7f9c3f46317c]
/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torchaudio/_torchaudio.so(+0x86dc3)[0x7f9c3f481dc3]
/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torchaudio/_torchaudio.so(+0x7bbfa)[0x7f9c3f476bfa]
python(PyCFunction_Call+0x58)[0x560a027b62d8]
python(_PyObject_MakeTpCall+0x23c)[0x560a027a5edc]
python(_PyEval_EvalFrameDefault+0x45a9)[0x560a02831879]
python(_PyEval_EvalCodeWithName+0x300)[0x560a027fb760]
python(_PyFunction_Vectorcall+0x1e3)[0x560a027fc593]
python(+0x10399c)[0x560a0276599c]
python(_PyFunction_Vectorcall+0x10b)[0x560a027fc4bb]
python(+0x10425f)[0x560a0276625f]
python(_PyEval_EvalCodeWithName+0x8b1)[0x560a027fbd11]
python(_PyFunction_Vectorcall+0x1e3)[0x560a027fc593]
python(+0x10425f)[0x560a0276625f]
python(PyEval_EvalCodeWithName+0x8b1)[0x560a027fbd11]
...
...
...
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Traceback (most recent call last):
File "wenet/bin/train.py", line 211, in
executor.train(model, optimizer, scheduler, train_data_loader, device,
File "/home/sine/wenet/wenet-main/examples/accent_reg/s0/wenet/utils/executor.py", line 63, in train
optimizer.zero_grad()
File "/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/optim/optimizer.py", line 171, in zero_grad
p.grad.detach
()
File "/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 30256) is killed by signal: Aborted.

x86 RuntimeError

非常感谢wenet的开源工作(๑•̀ㅂ•́)و✧!
以下是完整的报错:
impicture_20210323_142254
impicture_20210323_142301

这里跑x86用的模型是自己用8k音频训练的模型,然后把c++源码里的16k全部改成了8k,编译完成测试的时候报了上面的错,哪位知道解决办法还望赐教。

Stage 3 shows no such file or directory

I followed the instruct line by line, and when bash run.sh --stage 3 --stop-stage 3, it showed no such file or directory, what can I do? Thanks.

Describe the bug
tools/format_data.sh --nj 32 --feat-type wav --feat raw_wav/dev/wav.scp raw_wav/dev data/dict/lang_char.txt
split: illegal option -- -
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
ls: raw_wav/dev/log/wav_.slice: No such file or directory
cat: raw_wav/dev/log/wav_
.shape: No such file or directory
tools/format_data.sh --nj 32 --feat-type wav --feat raw_wav/test/wav.scp raw_wav/test data/dict/lang_char.txt
split: illegal option -- -
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
ls: raw_wav/test/log/wav_.slice: No such file or directory
cat: raw_wav/test/log/wav_
.shape: No such file or directory
tools/format_data.sh --nj 32 --feat-type wav --feat raw_wav/train/wav.scp raw_wav/train data/dict/lang_char.txt
split: illegal option -- -
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
ls: raw_wav/train/log/wav_.slice: No such file or directory
cat: raw_wav/train/log/wav_
.shape: No such file or directory

Desktop (please complete the following information):

  • OS: Mac os

Indiscriminate sed replacement triggers `use_static_chunk` issue

The following sed command in the librispeech s0 recipe switches all instances of dynamic to static, which if a train_unified_conformer.yml recipe is used causes the use_dynamic_chunk: true to be replaced with use_static_chunk: true at eval time.

TypeError: init() got an unexpected keyword argument 'use_static_chunk'

this is easily fixed by modifying the above run.sh line. But I wonder if there is something else going on there that I missed that could still affect the accuracy of the model at training/decode time. It seems to work OK, but I don't find this in the aishell run.sh for either recipe.

make failure

I create conda env accoding to following:

conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
conda install pytorch==1.6.0 cudatoolkit=10.1 torchaudio -c pytorch

cmake is OK, but error occure when make, the error is:

/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::replace(unsigned long, unsigned long, char const*, unsigned long)@GLIBCXX_3.4.21'
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::_Sp_locker::_Sp_locker(void const*)@GLIBCXX_3.4.21'
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(unsigned long, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const@GLIBCXX_3.4.21'
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::basic_ofstream<char, std::char_traits<char> >::basic_ofstream(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::_Ios_Openmode)@GLIBCXX_3.4.21'
.....

My os is centos 7.5 and gcc 7.3.1, cmake 3.19.4.

torch can't be loaded in Mac Pro

OSError: dlopen(/opt/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/lib/libtorch_global_deps.dylib, 10): Library not loaded: @rpath/libmkl_intel_lp64.dylib
Referenced from: /opt/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/lib/libtorch_global_deps.dylib
Reason: image not found

An error occurred while running on Android

I downloaded final.zip and words.txt, put them into Assets folder, and compiled the project into APK to run on Android system. When I open the app and click the button, it will report an error and flash back.

The error message is as followed :

26715-27453/com.mobvoi.wenet E/libc++abi: terminating with uncaught exception of type c10::Error: Expected at most 5 argument(s) for operator 'forward_encoder_chunk', but received 7 argument(s). Declaration: forward_encoder_chunk(torch.wenet.transformer.asr_model.___torch_mangle_21.ASRModel self, Tensor xs, Tensor? subsampling_cache=None, Tensor[]? elayers_output_cache=None, Tensor[]? conformer_cnn_cache=None) -> ((Tensor, Tensor, Tensor[], Tensor[]))
Exception raised from checkAndNormalizeInputs at ../aten/src/ATen/core/function_schema_inl.h:245 (most recent call first):
(no backtrace available)

26715-27453/com.mobvoi.wenet A/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 27453 (om.mobvoi.wenet), pid 26715 (om.mobvoi.wenet)

Issue about encoder-decoder attention

The encoder outputs are fed into decoder entirely, so the encoder-decoder attention attends to the whole sequence. Right? Why not use monotonic attention?

tools/format_data.sh bug

tools/format_data.sh文件89行
"${trans_type}" == "ch_char_en_bpe" 是否应该写作 "${trans_type}" == "cn_char_en_bpe"

Complementary Language Models

Do you support complementary language models, do you plan to? I didn’t notice any examples or related code in the repo.

How long can we decode wav file?

Describe the bug
I got the error about input file size(600 sec) with offline demo on server runtime.
But I have no error with 60 sec.
With streaming demo, I used the same wav file(600sec) and the server hung up.

To Reproduce
Steps to reproduce the behavior:
Go to...

cd wenet/runtime/server/x86

Run this command...

export GLOG_logtostderr=1
export GLOG_v=2
#wav_scp=raw_wav/test.scp
wav_path=
model_dir=

./build/decoder_main \
    --chunk_size -1 \
    --wav_path $wav_path \
    --model_path $model_dir/final.zip \
    --dict_path $model_dir/words.txt 2>&1 | tee log.txt

Get this error.(some logs added by me.)

$ bash offline_recog.sh 
I0322 07:28:19.092447  5399 torch_asr_model.cc:36] torch model info subsampling_rate 4 right context 6 sos 11175 eos 11175
I0322 07:28:19.111845  5399 feature_pipeline.h:43] feature pipeline config num_bins 80 frame_length 400frame_shift160
I0322 07:28:19.112640  5399 decoder_main.cc:74] wav raw_wav/wani.wav
I0322 07:28:19.113868  5399 wav.h:73] wav header info: data size 36
I0322 07:28:19.114097  5399 decoder_main.cc:79] read 18 samples, 1 channels, 16 bits, So we got the length of data is 18
I0322 07:28:19.114109  5399 fbank.h:133] Get the 18 samples
I0322 07:28:19.114113  5399 feature_pipeline.cc:39] add 0 frames
I0322 07:28:19.114121  5399 decoder_main.cc:83] num frames 0
I0322 07:28:19.114135  5399 torch_asr_decoder.cc:60] AdvanceDecoding
I0322 07:28:19.114140  5399 torch_asr_decoder.cc:78] Required 2147483647 get 0
terminate called after throwing an instance of 'c10::Error'
  what():  There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, QuantizedCPU, Autograd, Profiler, Tracer, Autocast]
Exception raised from reportError at ../aten/src/ATen/core/dispatch/Dispatcher.cpp:306 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x68 (0x7f9ffa3eaeb8 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libc10.so)
frame #1: c10::Dispatcher::reportError(c10::DispatchTable const&, c10::DispatchKey) + 0x18f (0x7f9ffb12780f in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #2: at::_cat(c10::ArrayRef<at::Tensor>, long) + 0x203 (0x7f9ffb8bf373 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #3: at::native::cat(c10::ArrayRef<at::Tensor>, long) + 0xbd (0x7f9ffb53f4ad in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x135fec6 (0x7f9ffb977ec6 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0xac4c3c (0x7f9ffb0dcc3c in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #6: at::cat(c10::ArrayRef<at::Tensor>, long) + 0x117 (0x7f9ffb8bf067 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x2ef7d5d (0x7f9ffd50fd5d in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0xac4c3c (0x7f9ffb0dcc3c in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #9: at::cat(c10::ArrayRef<at::Tensor>, long) + 0x117 (0x7f9ffb8bf067 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x40f79 (0x5650037f1f79 in ./build/decoder_main)
frame #11: <unknown function> + 0x3f57d (0x5650037f057d in ./build/decoder_main)
frame #12: <unknown function> + 0xe979 (0x5650037bf979 in ./build/decoder_main)
frame #13: __libc_start_main + 0xe6 (0x7f9ff926dbf6 in /lib/x86_64-linux-gnu/libc.so.6)
frame #14: <unknown function> + 0xde59 (0x5650037bee59 in ./build/decoder_main)

Transformer models

Is your feature request related to a problem? Please describe.
Low ressource languages and deep domain use cases need more efficient models

Describe the solution you'd like
Huggingface is working in their transformers library integrating ASR models like wave2vec 2 and speech transformer

Describe alternatives you've considered
The fairseq implementation of wave2vec has more dependencies and is more complex to use and less readable.

Additional context
Integrating huggingface models makes have you pretrained models from modelhub, multiple models support with less code.

wrong cmvn_opt path in multi_cn run.sh

When I run stage 4 of multi_cn run.sh
It gives an error: raw_wav/train/global_cmvn': No such file or directory

In the run.sh line 225, I think the path should be $cmvn &amp;&amp; cp ${feat_dir}}_${en_modeling_unit}/${train_set}/global_cmvn $dir ?
image

android demo识别率很低

早上好你叫什么名字去机场要怎么走
识别成:
宝上方你这什欢名次却具残要车
准确率大概25%,是否因为lm没加的缘故

Forward chunk by chunk in decoding mode

I can't understand forward chunk by chunk function in encoder.py

So, I have some problem about process of calculating right context parameter and
how to use it in forward chunk by chunk problem

  1. How to calculate context parameter like self.right context included in Conv2dSubsampling4 class in subsampling.py ?
  2. What is the frame_rate_of_this_layer in Conv2dSubsampling4 ?
  3. Why feed forward overlap input step by step in forward_chunk_by_chunk ?
  4. What is the stride and decoding_window and how to calculate them in forward_chunk_by_chunk?

Can anyone solve my problem ?

Two differences between wenet and espnet

Thank you for your nice work!
I found two differences between wenet and espnet:

  1. Layernorm/Batchnorm in conv module
  2. Positional Encoding reverse True/False
    Are these modifications necessary?

无法使用多卡运行

我clone了wenet的最新版本,然后想使用双卡来跑aishell1的demo,但是运行后报如下错误:
单卡是没有问题的,似乎是缺少一个flock的函数?

Traceback (most recent call last):
Traceback (most recent call last):
File "wenet/bin/train.py", line 185, in
File "wenet/bin/train.py", line 185, in
model = torch.nn.parallel.DistributedDataParallel(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 331, in init
model = torch.nn.parallel.DistributedDataParallel(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 331, in init
self._distributed_broadcast_coalesced(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 549, in _distributed_broadcast_coalesced
self._distributed_broadcast_coalesced(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 549, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
RuntimeError: flock: Function not implemented
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
RuntimeError: flock: Function not implemented
terminate called after throwing an instance of 'std::system_error'
what(): flock: Function not implemented
terminate called after throwing an instance of 'std::system_error'
what(): flock: Function not implemented

How to prepare english data?

Hi, Wenet is so amazing!
But I doubt if this model supports English cause we have to tokenize the sentence with " ".
Could I use "<SPACE!>" to represent " " in sentence like "H e l l o <SPACE!> w o r l d" or sth like that ?
Looking forward to your reply.

Compile Android Project on PC: UnsatisfiedLinkError: couldn't find "libwenet.so"

Hi, here is the problem, when I tried to compile the WENET latest android codes you offerred in my machine, and add the final.zip, words.txt file into the directory asked. However, it just crashed ( both the emulator in android studio and my perssonal cellphone) , always after I gave the permission for the privacy of recording. I did some online search try to find out [1] still, sadly all not work. Here are the problems which send back to me.

And my environment is: windows 10, sdk 6.0 - 9.0, Cmake 3.18.1.
微信截图_20210206181812

微信截图_20210206182246

E/AndroidRuntime: FATAL EXCEPTION: main
    Process: com.mobvoi.wenet, PID: 11753
    java.lang.UnsatisfiedLinkError: dalvik.system.PathClassLoader[DexPathList[[zip file "/data/app/com.mobvoi.wenet-VaKqTfUA4p7TCtqlZS2Gtg==/base.apk"],nativeLibraryDirectories=[/data/app/com.mobvoi.wenet-VaKqTfUA4p7TCtqlZS2Gtg==/lib/arm, /data/app/com.mobvoi.wenet-VaKqTfUA4p7TCtqlZS2Gtg==/base.apk!/lib/armeabi-v7a, /system/lib]]] couldn't find "libwenet.so"
        at java.lang.Runtime.loadLibrary0(Runtime.java:1012)
        at java.lang.System.loadLibrary(System.java:1669)
        at com.mobvoi.wenet.Recognize.<clinit>(Recognize.java:6)
        at com.mobvoi.wenet.Recognize.init(Native Method)
        at com.mobvoi.wenet.MainActivity.onCreate(MainActivity.java:88)
        at android.app.Activity.performCreate(Activity.java:7136)
        at android.app.Activity.performCreate(Activity.java:7127)
        at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1271)
        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2893)
        at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3048)
        at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:78)
        at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:108)
        at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:68)
        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1808)
        at android.os.Handler.dispatchMessage(Handler.java:106)
        at android.os.Looper.loop(Looper.java:193)
        at android.app.ActivityThread.main(ActivityThread.java:6669)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)

CMake error during server build

HEAD is now at 96a2f23 Merge pull request #419 from shinh/release-0-4-0
[ 33%] No patch step for 'glog-populate'
[ 44%] Performing update step for 'glog-populate'
[ 55%] No configure step for 'glog-populate'
[ 66%] No build step for 'glog-populate'
[ 77%] No install step for 'glog-populate'
[ 88%] No test step for 'glog-populate'
[100%] Completed 'glog-populate'
[100%] Built target glog-populate
CMake Error at /usr/lib/x86_64-linux-gnu/cmake/gflags/gflags-targets.cmake:37 (message):
Some (but not all) targets in this export set were already defined.

step4 run error , the error log as follow,which files should be modify to correct it ? thanks

root@e62b3865c7cc:~/data/project/wenet/examples/aishell/s0# ./run.sh
./run.sh: init method is file:///root/data/project/wenet/examples/aishell/s0/exp/sp_spec_aug/ddp_init
wenet/bin/train.py:76: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(fin)
Traceback (most recent call last):
File "wenet/bin/train.py", line 82, in
**configs['spec_aug_conf'],
KeyError: 'spec_aug_conf'
wenet/bin/train.py:76: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(fin)
Traceback (most recent call last):
File "wenet/bin/train.py", line 82, in
**configs['spec_aug_conf'],
KeyError: 'spec_aug_conf'
do model average and final checkpoint is exp/sp_spec_aug/avg_10.pt
Namespace(dst_model='exp/sp_spec_aug/avg_10.pt', max_epoch=65536, min_epoch=0, num=10, src_path='exp/sp_spec_aug', val_best=True)
Traceback (most recent call last):
File "wenet/bin/average_model.py", line 47, in
sort_idx = np.argsort(val_scores[:, -1])
IndexError: too many indices for array
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
./run.sh: line 165: python2: command not found
./run.sh: line 165: python2: command not found
./run.sh: line 165: python2: command not found
./run.sh: line 165: python2: command not found
Traceback (most recent call last):
File "wenet/bin/export_jit.py", line 29, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'

wav file parsing error when using runtime/server/x86/build/decoder_main

when I pass some wav files to decoder_main I may encounter exceptions as follows:

I0317 14:59:13.375226  1310 torch_asr_model.cc:36] torch model info subsampling_rate 4 right context 6 sos 4232 eos 4232
I0317 14:59:13.379293  1310 decoder_main.cc:80] num frames 0
I0317 14:59:13.379323  1310 torch_asr_decoder.cc:77] Required 2147483647 get 0
terminate called after throwing an instance of 'c10::Error'
  what():  There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, QuantizedCPU, Autograd, Profiler, Tracer, Autocast]

I guess it's a error caused by frontend/wav.h, in wav.h

# wav.h 
...
# line 31
struct WavHeader {
  char riff[4];  // "riff"
  unsigned int size;
  char wav[4];  // "WAVE"
  char fmt[4];  // "fmt "
  unsigned int fmt_size;
  uint16_t format;
  uint16_t channels;
  unsigned int sample_rate;
  unsigned int bytes_per_second;
  uint16_t block_size;
  uint16_t bit;
  char data[4];  // "data"
  unsigned int data_size;
};
...
# line 56
fread(&header, 1, sizeof(header), fp);
...
# line 72
    int num_data = header.data_size / (bits_per_sample_ / 8);
    data_ = new float[num_data];
    num_sample_ = num_data / num_channel_;

There is a struct containing RIFF-FORMAT-DATA chunk, while sometimes, a fine wav file may contain some other chunks in wav header, like fact chunk and list chunk , when we process audio files with ffmpeg or pydub which is based on ffmpeg, there's a high possibility a LIST CHUNK encoded into generated wav file, you can talk this link as a reference.

A better way I guess is to read 4 bytes detecting which chunk the following part is, and then process it, after the data chunk is detected, then we can continue with next steps.

I'm not familiar with C/C++ coding, and not sure if my analysis is correct, but if it is, hope you can fix it or add a notification in README, it will be of great help, thanks :-)

Run code for training got some errors

Thank u for making your code public. Is it all ready for runing now? There are many core dumed when trainning by DDP.If training by single gpu, torchscript gets some errors,too.Just like Unknown type name 'torch.device'. If I ignore torch.jit.script, also got errors.My pytorch version is 1.7.0.

when i run bash run.sh --stage 4 --stop-stage 4 ,got error:

Traceback (most recent call last):
File "wenet/bin/train.py", line 209, in
executor.train(model, optimizer, scheduler, train_data_loader, device,
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/utils/executor.py", line 35, in train
loss, loss_att, loss_ctc = model(feats, feats_lengths, target,
File "/home/dayu/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/transformer/asr_model.py", line 89, in forward
encoder_out, encoder_mask = self.encoder(speech, speech_lengths)
File "/home/dayu/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/transformer/encoder.py", line 133, in forward
masks = ~make_pad_mask(xs_lens).unsqueeze(1) # (B, 1, L)
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/utils/mask.py", line 140, in make_pad_mask
max_len = int(lengths.max().item())
RuntimeError: CUDA error: no kernel image is available for execution on the device

my evn is ubuntu20.04,cuda11.1 torch17.1

runtime error

Thank you for this great work.
I trained aishell follow aishell/s0, and get final.zip
I want try the x86 runtime, but get error:
cmd is :
./build/decoder_main --chunk_size -1 --wav_path /root/A2_0.wav --model_path ./final.zip --dict_path ./words.txt

the error is:
terminate called after throwing an instance of 'std::runtime_error'
what(): The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/torch/nn/functional.py", line 38, in forward_encoder_chunk
ret = ret2
else:
output = torch.matmul(input, torch.t(weight))
~~~~~~~~~~~~ <--- HERE
if torch.isnot(bias, None):
bias1 = unchecked_cast(Tensor, bias)

Traceback of TorchScript, original code (most recent call last):
File "/data/Softwares/miniconda3/envs/wenet/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in forward_encoder_chunk
ret = torch.addmm(bias, input, weight.t())
else:
output = input.matmul(weight.t())
~~~~~~~~~~~~ <--- HERE
if bias is not None:
output += bias
RuntimeError: size mismatch, m1: [244 x 4864], m2: [5120 x 256] at ../aten/src/TH/generic/THTensorMath.cpp:41

Did I make some mistakes? Or I need change configuration?
Thank you

ESPNet的方法有的同样问题

使用mask[:, :, :-2:2][:, :, :-2:2]得到的降采样mask和pytorch的卷积公式得到的长度不一致。具体例子如下:
lens = torch.LongTensor([[24], [40], [60], [100]]).cpu()
print(compute_conv_length(compute_conv_length(lens, kernel_size=3, stride=2), kernel_size=3, stride=2))
得到结果:tensor([[ 5],
[ 9],
[14],
[24]])
mask = make_mask_by_length(a, lens).unsqueeze(-2)
new_mask = mask[:, :, :-2:2][:, :, :-2:2]
将布尔值转为int型求和输出
print(torch.sum(new_mask.int(), -1))
tensor([[ 6],
[10],
[15],
[24]])

Some errors occured in training

I followed the Tutorial of aishell, errors occured in bash run.sh --stage 4 --stop-stage 5 step

run.sh: init method is file:///home/dapeng/PycharmProjects/wenet/examples/aishell/s0/exp/sp_spec_aug/ddp_init
  File "wenet/bin/train.py", line 81
    collate_func = CollateFunc(**configs['collate_conf'],
                                                        ^
SyntaxError: invalid syntax
do model average and final checkpoint is exp/sp_spec_aug/avg_10.pt
Traceback (most recent call last):
  File "wenet/bin/average_model.py", line 7, in <module>
    import yaml
ImportError: No module named yaml
  File "wenet/bin/recognize.py", line 87
    test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
                                                       ^
SyntaxError: invalid syntax
  File "wenet/bin/recognize.py", line 87
    test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
                                                       ^
SyntaxError: invalid syntax
  File "  File "wenet/bin/recognize.pywenet/bin/recognize.py", line ", line 8787

        test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
                                                                                                              ^
^
SyntaxErrorSyntaxError: : invalid syntaxinvalid syntax

Traceback (most recent call last):
  File "tools/compute-wer.py", line 365, in <module>
    with codecs.open(hyp_file, 'r', 'utf-8') as fh:
  File "/usr/lib/python2.7/codecs.py", line 898, in open
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_ctc_prefix_beam_search/text'
Traceback (most recent call last):
  File "tools/compute-wer.py", line 365, in <module>
    with codecs.open(hyp_file, 'r', 'utf-8') as fh:
  File "/usr/lib/python2.7/codecs.py", line 898, in open
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_attention_rescoring/text'
Traceback (most recent call last):
  File "tools/compute-wer.py", line 365, in <module>
Traceback (most recent call last):
    with codecs.open(hyp_file, 'r', 'utf-8') as fh:
  File "tools/compute-wer.py", line 365, in <module>
  File "/usr/lib/python2.7/codecs.py", line 898, in open
    with codecs.open(hyp_file, 'r', 'utf-8') as fh:
  File "/usr/lib/python2.7/codecs.py", line 898, in open
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_ctc_greedy_search/text'
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_attention/text'

windows runtime

hi ,i complied the runtime/x86 code on windows platform,using vs2017 ,but when run the decode_main demo , it gets empty recognize result , why ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.