hawkaaron / e2e-asr Goto Github PK

View Code? Open in Web Editor NEW

126.0 10.0 27.0 44 KB

PyTorch Implementations for End-to-End Automatic Speech Recognition

Python 83.03% Shell 16.97%

asr speech-recognition transducer rnnt-joint pytorch end-to-end

e2e-asr's People

Contributors

Stargazers

Watchers

e2e-asr's Issues

implement the ASR to TensorFlow warp binding

@HawkAaron Hi,
I am working on RNN-t training with this E2E-ASR repo, with pytorch binding. and I found you have another repo that warp MXnet and TensorFlow. I suppose that warp provides universal usage for 3 different bindings.
While I do not find a plug-in place in the training code. Do I need to change pytorch method manually? What is the right way to implement ASR training with TensorFlow warp binding?
Thanks in advance.

Why we use vocab_size = 62?

Why we use vocab_size = 62? Not vocab_size = 10000, the same as RNN-Transducer paper?
Are some reasons in speech data processing?
Many thanks~

the question that I use the timit datasets ,but the rnnt model does not coverage！

I adopt this model with timit datasets, but the rnnt model does not coverage, the loss is about 500+, and I want to know the reason which cause it?

RNNT using Lingvo framework

Hi @HawkAaron
Thanks for this implementation. I found it very usefull.

Do you have any plans to implement this using Lingvo framework?

eval.py error

Questions about results

Hello Mingkun:
Firstly, thank you for contributing the code. I want to know if your ctc model and rnn transducer have achieved the results in Alex Graves' paper. Before that, my own ctc model without any LM achived PER 21 on TIMIT, but it's far from Alex, I also run your code followed as your default params and achieve PER 22. I am so confused about that. It would be great if you could give me some advice.
Best regards,
Zhengkun Tian

undefined symbol: state

Hi. I am trying to train the rnn_t model using the pytorch binding. Would really appreciate if someone can shed some light on the issue I have.

When I run,

"" python3 train_rnnt.py --lr 4e-4 --bi --dropout 0.5 --out exp/rnnt_bi_lr4e-4 --schedule ""

I get the following error:

Traceback (most recent call last):
File "train_rnnt.py", line 12, in
from model import Transducer
File "/home/suhas/E2E-ASR/model.py", line 6, in
from warprnnt_pytorch import RNNTLoss
File "/usr/local/lib/python3.6/dist-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/init.py", line 6, in
from .warp_rnnt import *
ImportError: /usr/local/lib/python3.6/dist-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/warp_rnnt.cpython-36m-x86_64-linux-gnu.so: undefined symbol: state

I really do not know what to make of it.

My system details:
Ubuntu 18.04 LTS, CUDA-10.2
Pytorch built from source

Question on feature transform

@HawkAaron
Hi, I have a question about your code on feature transform part.
According to Alex Graves 2013 paper, the feature applied is described as

The audio data was encoded using a Fourier-transform-based filter-bank with 40 coefficients (plus energy) distributed on a mel-scale, together with their first and second temporal derivatives. Each input vector was therefore size 123. The data were normalised so that every element of the input vectors had zero mean and unit variance over the training set.

In your code DataLoader.py the feature transform part is :

copy-feats scp:data_timit/{}/feats.scp ark:- | apply-cmvn --utt2spk=ark:data_timit/{}/utt2spk scp:data_timit/{}/cmvn.scp ark:- ark:- |\
 add-deltas --delta-order=2 ark:- ark:- | nnet-forward data_timit/final.feature_transform ark:- ark:-

Correct me if I make an error here, I think the feature transform is already accomplished before the nnet-forward command.
So why did you use a nnet to make the feature embedding?

When I look into the feature_transform.sh, I got more confused that the net-forward part seems to be another feature normalization all over again, can you explain a little bit for this part? Thx

Two problems about training and decoding

Hi, @HawkAaron I'm trying to train transducer with pytorch (I prefer to use it rather than MxNet) and I changed the code of this repo following another implementation of MxNet. However, I found the model cannot converge to a good result. Is there something wrong in my code?

Another problem is that I try to replace the code of here with while loop. However the model cannot get out of the while loop, is there something different of two implementations?

The label decoded is not the same as the pmap data.

@HawkAaron I have met a mistake that the result decoded includes [52,53,54,55,56,57,......],
while the rephone length is just 51. Hence，I met a bug as followes:
the one of y (the result of decoded ) is as followes:
(53, 59, 53, 59, 56, 53, 43, 53, 43, 53, 5, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 25, 53, 25, 53, 48, 53, 59, 53, 25, 53, 43, 53, 43, 53, 43, 53, 59, 53, 59, 31, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 43, 53, 25, 53, 25, 53, 43, 53, 35, 53, 43, 53, 31, 53, 43, 53, 43, 53, 31, 53, 43, 53, 48, 53, 59, 5, 53, 32, 53, 43, 53, 59, 43, 31, 53, 59, 53, 25, 53, 25, 5, 25, 32, 53, 43, 53, 43, 53, 43, 53, 43, 53, 59, 53, 31, 53, 59, 53, 59)
Traceback (most recent call last):
File "eval.py", line 93, in
decode()
File "eval.py", line 84, in decode
y = [pmap[rephone[i]] for i in y]
File "eval.py", line 84, in
y = [pmap[rephone[i]] for i in y]
KeyError: 53

Why my pmap'(or y) length is not as long as the rephone ? And one of the phoneme such as 'sil' is not included in the pmap dict

missing txt file for training

Hi i wanted to integrate this transducer model into one of my project, so i wanted to try out to train this model using the script train_rnnt given in the repo.
but i get an error while opening the file 'data/lang/phones.txt'
can you please share this folder ? if not can you please tell me how is the data present in the file

thank you.