hawkaaron / e2e-asr Goto Github PK
View Code? Open in Web Editor NEWPyTorch Implementations for End-to-End Automatic Speech Recognition
PyTorch Implementations for End-to-End Automatic Speech Recognition
@HawkAaron Hi,
I am working on RNN-t training with this E2E-ASR repo, with pytorch binding. and I found you have another repo that warp MXnet and TensorFlow. I suppose that warp provides universal usage for 3 different bindings.
While I do not find a plug-in place in the training code. Do I need to change pytorch method manually? What is the right way to implement ASR training with TensorFlow warp binding?
Thanks in advance.
Why we use vocab_size = 62? Not vocab_size = 10000, the same as RNN-Transducer paper?
Are some reasons in speech data processing?
Many thanks~
I adopt this model with timit datasets, but the rnnt model does not coverage, the loss is about 500+, and I want to know the reason which cause it?
Hi @HawkAaron
Thanks for this implementation. I found it very usefull.
Do you have any plans to implement this using Lingvo framework?
Hello Mingkun:
Firstly, thank you for contributing the code. I want to know if your ctc model and rnn transducer have achieved the results in Alex Graves' paper. Before that, my own ctc model without any LM achived PER 21 on TIMIT, but it's far from Alex, I also run your code followed as your default params and achieve PER 22. I am so confused about that. It would be great if you could give me some advice.
Best regards,
Zhengkun Tian
Hi. I am trying to train the rnn_t model using the pytorch binding. Would really appreciate if someone can shed some light on the issue I have.
When I run,
"" python3 train_rnnt.py --lr 4e-4 --bi --dropout 0.5 --out exp/rnnt_bi_lr4e-4 --schedule ""
I get the following error:
Traceback (most recent call last):
File "train_rnnt.py", line 12, in
from model import Transducer
File "/home/suhas/E2E-ASR/model.py", line 6, in
from warprnnt_pytorch import RNNTLoss
File "/usr/local/lib/python3.6/dist-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/init.py", line 6, in
from .warp_rnnt import *
ImportError: /usr/local/lib/python3.6/dist-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/warp_rnnt.cpython-36m-x86_64-linux-gnu.so: undefined symbol: state
I really do not know what to make of it.
My system details:
Ubuntu 18.04 LTS, CUDA-10.2
Pytorch built from source
@HawkAaron
Hi, I have a question about your code on feature transform part.
According to Alex Graves 2013 paper, the feature applied is described as
The audio data was encoded using a Fourier-transform-based filter-bank with 40 coefficients (plus energy) distributed on a mel-scale, together with their first and second temporal derivatives. Each input vector was therefore size 123. The data were normalised so that every element of the input vectors had zero mean and unit variance over the training set.
In your code DataLoader.py the feature transform part is :
copy-feats scp:data_timit/{}/feats.scp ark:- | apply-cmvn --utt2spk=ark:data_timit/{}/utt2spk scp:data_timit/{}/cmvn.scp ark:- ark:- |\
add-deltas --delta-order=2 ark:- ark:- | nnet-forward data_timit/final.feature_transform ark:- ark:-
Correct me if I make an error here, I think the feature transform is already accomplished before the nnet-forward command.
So why did you use a nnet to make the feature embedding?
When I look into the feature_transform.sh, I got more confused that the net-forward part seems to be another feature normalization all over again, can you explain a little bit for this part? Thx
Hi, @HawkAaron I'm trying to train transducer with pytorch (I prefer to use it rather than MxNet) and I changed the code of this repo following another implementation of MxNet. However, I found the model cannot converge to a good result. Is there something wrong in my code?
Another problem is that I try to replace the code of here with while loop. However the model cannot get out of the while loop, is there something different of two implementations?
@HawkAaron I have met a mistake that the result decoded includes [52,53,54,55,56,57,......],
while the rephone length is just 51. Hence,I met a bug as followes:
the one of y (the result of decoded ) is as followes:
(53, 59, 53, 59, 56, 53, 43, 53, 43, 53, 5, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 25, 53, 25, 53, 48, 53, 59, 53, 25, 53, 43, 53, 43, 53, 43, 53, 59, 53, 59, 31, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 25, 53, 25, 53, 43, 53, 35, 53, 43, 53, 31, 53, 43, 53, 43, 53, 31, 53, 43, 53, 48, 53, 59, 5, 53, 32, 53, 43, 53, 59, 43, 31, 53, 59, 53, 25, 53, 25, 5, 25, 32, 53, 43, 53, 43, 53, 43, 53, 43, 53, 59, 53, 31, 53, 59, 53, 59)
Traceback (most recent call last):
File "eval.py", line 93, in
decode()
File "eval.py", line 84, in decode
y = [pmap[rephone[i]] for i in y]
File "eval.py", line 84, in
y = [pmap[rephone[i]] for i in y]
KeyError: 53
Why my pmap'(or y) length is not as long as the rephone ? And one of the phoneme such as 'sil' is not included in the pmap dict
Hi i wanted to integrate this transducer model into one of my project, so i wanted to try out to train this model using the script train_rnnt given in the repo.
but i get an error while opening the file 'data/lang/phones.txt'
can you please share this folder ? if not can you please tell me how is the data present in the file
thank you.
After "run.sh" and "feature_transform.sh", 69 dim features are given.
It makes error at 54 line in train_rnnt.py .
thank you
Are there any results on any standard dataset?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.