The speech from awni

WARNING: Forward backward likelihood mismatch 0.000050

In my training phase, I am frequently getting this Warning

ModuleNotFoundError: No module named 'functions.ctc';

Which functions module shall I install?
ModuleNotFoundError: No module named 'functions.ctc';

Word Level LM at Line 96

I have a KenLM scoring integrated at Line 96. The performance on my test set (Both LM and Test set are LibriSpeech based) is worse than not using an LM at all. I am scoring only at space, multiplying the log probability (Converted from log10) by Alpha and also compensating with bonus term by adding (beta * log(word count in prefix)). I am applying this only to "not blank" probability. I have no success. Has anyone achieved success by integrating LM scoring?

I used my test set and Language model with Paddle Paddle decoder with same acoustic model and there was a 6% improvement in WER. They have a trie based LM aided by WFST correction along with this beam search algo. I would appreciate any pointers or help here. Thanks!

Error in seq2seq.py of Method collate

/home/wangph/code/pytorch_egs/speech/speech/models/seq2seq.py:235: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
inputs.volatile = True
/home/wangph/code/pytorch_egs/speech/speech/models/seq2seq.py:236: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
labels.volatile = True

Traceback (most recent call last):
File "train.py", line 146, in
run(config)
File "train.py", line 109, in run
dev_loss, dev_cer = eval_dev(model, dev_ldr, preproc)
File "train.py", line 58, in eval_dev
loss = model.loss(batch)
File "/home/wangph/code/pytorch_egs/speech/speech/models/seq2seq.py", line 53, in loss
x, y = self.collate(*batch)
TypeError: collate() missing 2 required positional arguments: 'inputs' and 'labels'

Results on LibriSpeech very bad

Hi,

I used this tool to train a seq to seq speech system on LibriSpeech data, however the results are very bad.
Did you had similar results ?
Did you know please how can i fix this issue ?

Thank you
Sahar

Availability of pretrained model for RNN Transducer & Seq2Seq Attention Model

Hey,
I wanted to inquire if there are any plans to open source the pretrained models, for the RNN Transducer and Seq2Seq Model?
If there are such pretrained models, can anyone please share the link?

RNN Transducer inference problem

Hi,
In your transducer_model inference function:
https://github.com/awni/speech/blob/master/speech/models/transducer_model.py#L93
where use both acoustic feature and ground truth label, which is not inference at all.
At least, do gready search or beam search.

When training model using transducer loss, the acoustic model PER is too big, can you provide a trained
baseline of RNN Transducer ?

pytest failure

Environment

Titan Xp
CUDA 9.0
cnDNN 7.1.3

Ubuntu 16.04
Python 2.7
Pytorch 0.4.0

Code to reproduce the issue

git clone https://github.com/awni/speech.git
cd speech
conda create -n asr -y python=2.7
source activate asr
pip install -r requirements
pip install http://download.pytorch.org/whl/cu90/torch-0.4.0-cp27-cp27mu-linux_x86_64.whl 
pip install torchvision 
make
source setup.sh
cd test
pytest

when I was running the training on my own data (or with pytest), it fails with the following error:

ERROR: TypeError: activations must be <type 'torch.FloatTensor'>

Anyone has an idea what happens?
This issue persists with or without GPU.

============================= test session starts ==============================
platform linux2 -- Python 2.7.15, pytest-3.2.3, py-1.4.34, pluggy-0.4.0
rootdir: /data2/colosseum/test-speech2/speech/tests, inifile:
collected 9 items

ctc_test.py F.
io_test.py .
loader_test.py ..
model_test.py .
seq2seq_test.py .
wave_test.py ..

=================================== FAILURES ===================================

________________________________ test_ctc_model ________________________________

    def test_ctc_model():
        freq_dim = 40
        vocab_size = 10

        batch = shared.gen_fake_data(freq_dim, vocab_size)
        batch_size = len(batch[0])

        model = CTC(freq_dim, vocab_size, shared.model_config)
        out = model(batch)

        assert out.size()[0] == batch_size

        # CTC model adds the blank token to the vocab
        assert out.size()[2] == (vocab_size + 1)

        assert len(out.size()) == 3

>       loss = model.loss(batch)

ctc_test.py:26:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../speech/models/ctc_model.py:39: in loss
    loss = loss_fn(out, y, x_lens, y_lens)
../libs/warp-ctc/pytorch_binding/functions/ctc.py:77: in forward
    costs = parent.forward(*args)
../libs/warp-ctc/pytorch_binding/functions/ctc.py:41: in forward
    certify_inputs(activations, labels, lengths, label_lengths)
../libs/warp-ctc/pytorch_binding/functions/ctc.py:107: in certify_inputs
    check_type(activations, torch.FloatTensor, "activations")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

var = tensor([[[-0.0090,  0.4523,  0.0716,  ...,  0.0900, -0.0668,  0.1392],
       ...1443],
         [ 0.1413, -0.0695,  0.0591,  ..., -0.3491, -0.0151, -0.0068]]])
t = <type 'torch.FloatTensor'>, name = 'activations'

    def check_type(var, t, name):
        if type(var) is not t:
>           raise TypeError("{} must be {}".format(name, t))
E           TypeError: activations must be <type 'torch.FloatTensor'>

../libs/warp-ctc/pytorch_binding/functions/ctc.py:92: TypeError
====================== 1 failed, 8 passed in 3.60 seconds ======================

Errors with Installation

Hi,

I have successfully installed the following:
virtualenv e2e_awni
source e2e_awni/bin/activate
cd speech
pip install -r requirements.txt

As the next step, should I install pytorch while virtualenv is activated or not?

The following errors occur If I install pytorch when virtualenv is activated:

(e2e_awni)kevin@DEVBOX2:~$ pip install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp27-cp27mu-linux_x86_64.whl
torch-0.4.1-cp27-cp27mu-linux_x86_64.whl is not a supported wheel on this platform.
Storing debug log for failure in /home/zhme/.pip/pip.log

(e2e_awni)kevin@DEVBOX2:~$ pip install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp27-cp27m-linux_x86_64.whl
torch-0.4.1-cp27-cp27m-linux_x86_64.whl is not a supported wheel on this platform.
Storing debug log for failure in /home/zhme/.pip/pip.log

I can successfully install pytorch when virtualenv is deactivated. But the following errors occur when I run pytest under speech/tests after "make".

(e2e_awni)kevin@DEVBOX2:~/speech/tests$ pytest

================================================================================== ERRORS ===================================================================================
_______________________________________________________________________ ERROR collecting ctc_test.py ________________________________________________________________________
ImportError while importing test module '/home/kevin/speech/tests/ctc_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ctc_test.py:2: in
import torch
E ImportError: No module named torch
________________________________________________________________________ ERROR collecting io_test.py ________________________________________________________________________
ImportError while importing test module '/home/kevin/speech/tests/io_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
io_test.py:3: in
import speech.models
E ImportError: No module named speech.models
______________________________________________________________________ ERROR collecting loader_test.py ______________________________________________________________________
ImportError while importing test module '/home/kevin/speech/tests/loader_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
loader_test.py:3: in
from speech import loader
E ImportError: No module named speech
______________________________________________________________________ ERROR collecting model_test.py _______________________________________________________________________
ImportError while importing test module '/home/kevin/speech/tests/model_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
model_test.py:3: in
import torch
E ImportError: No module named torch
_____________________________________________________________________ ERROR collecting seq2seq_test.py ______________________________________________________________________
ImportError while importing test module '/home/kevin/speech/tests/seq2seq_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
seq2seq_test.py:3: in
import torch
E ImportError: No module named torch
_______________________________________________________________________ ERROR collecting wave_test.py _______________________________________________________________________
ImportError while importing test module '/home/kevin/speech/tests/wave_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
wave_test.py:4: in
import speech.utils.wave as wave
E ImportError: No module named speech.utils.wave
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 6 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================================== 6 error in 0.23 seconds ==========================================================================

Could you help me out?

Thank you.

change package structure so module is independent of repo name

See Issue model error #3 for reference.

Sample generation

How I can generate predicted label from an audio after training?

make errors

Hi sir,
I am trying to use your "speech", but when i carry out speech, some errors occur:

Can you give me some help to fix it? Thanks a lot.

Loss is decrease but SER is increase

Hello, I used RNNT training on the Chinese speech recognition library of more than 300 hours (the encoder did pretrain, but the decoder is a random initialization parameter). After training dozens of epoch, the loss first quickly dropped from more than 1000 to 60. Then slowly dropped to more than 20, but the SER of inference has risen from 2 to 20. Is this normal? It seems that you mentioned this phenomenon elsewhere.
Thank you very much!

help

can you tell you which version of pytorch do you use？ @awni

A question about the TRAINING SET used in timit script

Normally we use the standard 462-speaker data as training set, while this timit exmaple use 556-speaker data(including some data from the full test set) in train.json.
Although the WER results seem pretty promising in this repo, are the methods you use here really convincing or comparable?

Could you share your WER on librispeech?

Thank you for sharing your code! Could you share your WER on librispeech?

transducer results

@awni Do you have results for the transducer model?

help

I used pdb to debug it and found that when it exec "loss.backward()" in function run_epoch. It told me that "Segmentation fault (core dumped)". I will appreciate if you can help me. @awni

RNN Transducer training problem

Hi,
It seems that your implementation of RNN Transducer loss function is right. But when I train
Graves2012 TIMIT, the loss decrease, but the PER increase, no matter how to adjust learning rate.
( If choose a small lr, the PER would be first decrease, then increase all the time. )

In your training procedure, the RNNT loss is exactly decreasing, but if you output the PER, it increasing!
So what's wrong ?

No module named functions.ctc

I get the next error when I try to run train.py:

(pytorch) sroca@nx2:~/speech>> python train.py examples/librispeech/config.json

Traceback (most recent call last):
  File "train.py", line 16, in <module>
    import speech.models as models
  File "/imatge/sroca/speech/models/__init__.py", line 4, in <module>
    from speech.models.ctc_model import CTC
  File "/imatge/sroca/speech/models/ctc_model.py", line 9, in <module>
    import functions.ctc as ctc
ImportError: No module named functions.ctc

How can I solve this issue?

Information about performance / example usage

It would be nice to have a small usage example in README. And maybe some notes about the performace on some dataset.

KeyError: 'start_and_end'

When I try to run the "train.py", I get the following error:

(venv-speech) sroca@nx2:~/speech>> python train.py examples/librispeech/config.json

Traceback (most recent call last):
  File "train.py", line 145, in <module>
    run(config)
  File "train.py", line 80, in run
    start_and_end=data_cfg["start_and_end"])
KeyError: 'start_and_end'
srun: error: c8: task 0: Exited with exit code 1

It seems that the object 'start_and_end' is not defined anywhere, so it can't be found.

How can I fix it?

dynamic module does not define module export function (PyInit__ctc)

torch type mismatch error

With python3.6, pytorch0.4.1, cuda9.0,
I got the following error when I run train.py with timit example:

$ python train.py examples/timit/seq2seq_config.json
Traceback (most recent call last):
  File "train.py", line 146, in <module>
    run(config)
  File "train.py", line 104, in run
    run_state = run_epoch(model, optimizer, train_ldr, *run_state)
  File "train.py", line 29, in run_epoch
    loss = model.loss(batch)
  File "/path/to/speech/models/seq2seq.py", line 57, in loss
    out, alis = self.forward_impl(x, y)
  File "/path/to/speech/models/seq2seq.py", line 68, in forward_impl
    out, alis = self.decode(x, y)
  File "/path/to/speech/models/seq2seq.py", line 103, in decode
    hx = self.dec_rnn(ix.squeeze(dim=1), hx)
  File "/path/to/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/path/to/lib64/python3.6/site-packages/torch/nn/modules/rnn.py", line 794, in forward
    self.bias_ih, self.bias_hh,
  File "/path/to/lib64/python3.6/site-packages/torch/nn/_functions/rnn.py", line 53, in GRUCell
    gh = F.linear(hidden, w_hh)
  File "/path/to/lib64/python3.6/site-packages/torch/nn/functional.py", line 1026, in linear
    output = input.matmul(weight.t())
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'mat2'

If I add torch.set_default_tensor_type('torch.cuda.FloatTensor') in main function,
error becomes:

Traceback (most recent call last):
  File "train.py", line 148, in <module>
    run(config)
  File "train.py", line 110, in run
    dev_loss, dev_cer = eval_dev(model, dev_ldr, preproc)
  File "train.py", line 57, in eval_dev
    preds = model.infer(batch)
  File "/path/to/speech/models/seq2seq.py", line 176, in infer
    _, argmaxs = self.infer_decode(x, y, end_tok, max_len)
  File "/path/to/speech/models/seq2seq.py", line 155, in infer_decode
    if torch.sum(y.data == end_tok) == y.numel():
RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #2 'other'

Do you have idea to solve this?

Make Requires Cuda

When we follow the installation instructions, the "make" command throws us the error, "CUDA_TOOKIT_ROOT_DIR not found". How do we build this repo on a machine without a GPU? Thanks!

make error from torch.utils.ffi import create_extension

Traceback (most recent call last):
File "build.py", line 4, in
from torch.utils.ffi import create_extension
File "/home/imr555/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 1, in
raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.
Makefile:5: recipe for target 'warp' failed
make: *** [warp] Error 1

How to get the text for a given audio file?

How to get the text for a given audio file? Thanks.

transducer

@awni In the infer code, are you still using ground truth labels for testing phase? This confused me since we do not have a ground truth when applying to an unseen data. Or do you just forward a fake input (such as batch_size x 1 with all zero label) when in the testing phase?

Also, will you maintain this excellent project in the future?

Thank you very much.

请问有成功训练出中文汉语模型的么

High quality samples?

Hello, I was wondering if you had any high quality samples to link in the repo? I'm looking to achieve something similar to this: https://google.github.io/tacotron/publications/tacotron2/

TypeError: 'float' object cannot be interpreted as an index

I'm trying to run the Seq2Seq model on the LibriSpeech corpus. I copied the config file for the TIMIT data and pointed it at Librispeech. Upon training...

(py27) [10:54 user@host:speech$] python train.py examples/librispeech/seq2seq_best.config
Traceback (most recent call last):
  File "train.py", line 145, in <module>
    run(config)
  File "train.py", line 80, in run
    start_and_end=data_cfg["start_and_end"])
  File "/people/user/speech/speech/loader.py", line 35, in __init__
    self.mean, self.std = compute_mean_std(audio_files[:max_samples])
  File "/people/user/speech/speech/loader.py", line 81, in compute_mean_std
    for af in audio_files]
  File "/people/user/speech/speech/loader.py", line 154, in log_specgram_from_file
    return log_specgram(audio, sr)
  File "/people/user/speech/speech/loader.py", line 165, in log_specgram
    detrend=False)
  File "/people/user/.conda/envs/py27/lib/python2.7/site-packages/scipy/signal/spectral.py", line 691, in spectrogram
    input_length=x.shape[axis])
  File "/people/user/.conda/envs/py27/lib/python2.7/site-packages/scipy/signal/spectral.py", line 1775, in _triage_segments
    win = get_window(window, nperseg)
  File "/people/user/.conda/envs/py27/lib/python2.7/site-packages/scipy/signal/windows/windows.py", line 2106, in get_window
    return winfunc(*params)
  File "/people/user/.conda/envs/py27/lib/python2.7/site-packages/scipy/signal/windows/windows.py", line 786, in hann
    return general_hamming(M, 0.5, sym)
  File "/people/user/.conda/envs/py27/lib/python2.7/site-packages/scipy/signal/windows/windows.py", line 1016, in general_hamming
    return general_cosine(M, [alpha, 1. - alpha], sym)
  File "/people/user/.conda/envs/py27/lib/python2.7/site-packages/scipy/signal/windows/windows.py", line 116, in general_cosine
    w = np.zeros(M)
TypeError: 'float' object cannot be interpreted as an index
(py27) [10:56 user@host:speech$]

Any ideas, @awni?

Is averaging of the training loss in seq2seq correct?

I don't understand the way the training loss is averaged.

The losses are summed for each minibatch, because of the argument size_average=False in cross_entropy function. Then, there is a line loss_val = loss_val / batch_size that could average over all the batches, except that in one batch, there are many letters to decode, so the loss is calculated over more than batch_size letters. The correct number would be y.shape[0] (all the predictions from all the batches are concatenated to one-dimensional vector).
According to that, the line n. 66 in seq2seq.py should be

loss_val = loss_val / y.shape[0]

Am I right, or I'm missing something?

ctc decoder with language model

Hi,
Thanks for your work!
I implemented a LM with your py-arpa-lm.. The output of lm.score_tg returns a ln scale score.

I just would like to check if the following is correct way to add LM:

# *NB* this would be a good place to include an LM score. insertion penalty lm_score = alpha * lm.score_tg(n_prefix) ins_p = beta * np.log(len(n_prefix)) next_beam[n_prefix] = (n_p_b + lm_score + ins_p, n_p_nb + lm_score + ins_p)

Thanks!

Add licence

I'd like to ask to add a license to the repository

your paper link

error

when I exec "make", it told me that "ImportError: No module named torch",can you help me?

Pytest FAILURES

When I do pytest, I got this error: 'CTC' object has no attribute '_modules' (The details are shown at the bottom.)
Is there something changed in the 'speech/models' folder?
I used the old folder (I cloned about two month ago) to replace the current one, and the pytest passed.

=========================================================================================== FAILURES ============================================================================================
________________________________________________________________________________________ test_ctc_model _________________________________________________________________________________________

def test_ctc_model():
    freq_dim = 40
    vocab_size = 10

    batch = shared.gen_fake_data(freq_dim, vocab_size)
    batch_size = len(batch[0])

  model = CTC(freq_dim, vocab_size, shared.model_config)

ctc_test.py:16:

self = <[AttributeError("'CTC' object has no attribute '_modules'") raised in repr()] SafeRepr object at 0x107f3a5f0>, freq_dim = 40, output_dim = 10
config = {'dropout': 0.0, 'encoder': {'conv': [[32, 5, 32, 2]], 'rnn': {'bidirectional': False, 'dim': 16, 'layers': 1}}}

def __init__(self, freq_dim, output_dim, config):

  super().__init__(freq_dim, config)

E TypeError: super() takes at least 1 argument (0 given)

../speech/models/ctc_model.py:15: TypeError
___________________________________________________________________________________________ test_save ___________________________________________________________________________________________

def test_save():

    freq_dim = 120
    model = speech.models.Model(freq_dim,

                  shared.model_config)

io_test.py:12:

self = <[AttributeError("'Model' object has no attribute '_modules'") raised in repr()] SafeRepr object at 0x107f4c170>, input_dim = 120
config = {'dropout': 0.0, 'encoder': {'conv': [[32, 5, 32, 2]], 'rnn': {'bidirectional': False, 'dim': 16, 'layers': 1}}}

def __init__(self, input_dim, config):

  super().__init__()

E TypeError: super() takes at least 1 argument (0 given)

../speech/models/model.py:13: TypeError
__________________________________________________________________________________________ test_model ___________________________________________________________________________________________

def test_model():
    time_steps = 100
    freq_dim = 40
    batch_size = 4

  model = speech.models.Model(freq_dim, shared.model_config)

model_test.py:15:

self = <[AttributeError("'Model' object has no attribute '_modules'") raised in repr()] SafeRepr object at 0x107f73f80>, input_dim = 40
config = {'dropout': 0.0, 'encoder': {'conv': [[32, 5, 32, 2]], 'rnn': {'bidirectional': False, 'dim': 16, 'layers': 1}}}

def __init__(self, input_dim, config):

  super().__init__()

E TypeError: super() takes at least 1 argument (0 given)

../speech/models/model.py:13: TypeError
__________________________________________________________________________________________ test_model ___________________________________________________________________________________________

def test_model():
    freq_dim = 120
    vocab_size = 10

    np.random.seed(1337)
    torch.manual_seed(1337)

    conf = shared.model_config
    rnn_dim = conf['encoder']['rnn']['dim']
    conf["decoder"] = {"embedding_dim" : rnn_dim,
                       "layers" : 2}

  model = Seq2Seq(freq_dim, vocab_size + 1, conf)

seq2seq_test.py:21:

self = <[AttributeError("'Seq2Seq' object has no attribute '_modules'") raised in repr()] SafeRepr object at 0x107f45758>, freq_dim = 120, vocab_size = 11
config = {'decoder': {'embedding_dim': 16, 'layers': 2}, 'dropout': 0.0, 'encoder': {'conv': [[32, 5, 32, 2]], 'rnn': {'bidirectional': False, 'dim': 16, 'layers': 1}}}

def __init__(self, freq_dim, vocab_size, config):

  super().__init__(freq_dim, config)

E TypeError: super() takes at least 1 argument (0 given)

../speech/models/seq2seq.py:17: TypeError
============================================================================== 4 failed, 5 passed in 0.79 seconds ===============================================================================

error

Dear awni,
when I exec "python train.py examples/timit/seq2seq_config.json", it told me that "Segmentation fault (core dumped)",can you help me ? @awni

Can't find editdistance frm

Hi,

thanks for sharing this code. We are trying to run it but we actually obtain an error when running pytest, it seems that editdistance imported in scores.py is not available.

(pytorch) sroca@nx2:~/speech/tests>> pytest
=============================================== test session starts ================================================
platform linux2 -- Python 2.7.9, pytest-3.4.1, py-1.5.2, pluggy-0.6.0
rootdir: /imatge/sroca/speech/tests, inifile:
collected 0 items / 6 errors

====================================================== ERRORS ======================================================
___________________________________________ ERROR collecting ctc_test.py ___________________________________________
ImportError while importing test module '/imatge/sroca/speech/tests/ctc_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
ctc_test.py:5: in
from speech.models import CTC
../../pytorch/speech/init.py:2: in
from speech.utils.score import compute_cer
../../pytorch/speech/utils/score.py:5: in
import editdistance
E ImportError: No module named editdistance
___________________________________________ ERROR collecting io_test.py ____________________________________________
ImportError while importing test module '/imatge/sroca/speech/tests/io_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
io_test.py:3: in
import speech.models
../../pytorch/speech/init.py:2: in
from speech.utils.score import compute_cer
../../pytorch/speech/utils/score.py:5: in
import editdistance
E ImportError: No module named editdistance
_________________________________________ ERROR collecting loader_test.py __________________________________________
ImportError while importing test module '/imatge/sroca/speech/tests/loader_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
loader_test.py:3: in
from speech import loader
../../pytorch/speech/init.py:2: in
from speech.utils.score import compute_cer
../../pytorch/speech/utils/score.py:5: in
import editdistance
E ImportError: No module named editdistance
__________________________________________ ERROR collecting model_test.py __________________________________________
ImportError while importing test module '/imatge/sroca/speech/tests/model_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
model_test.py:6: in
import speech.models
../../pytorch/speech/init.py:2: in
from speech.utils.score import compute_cer
../../pytorch/speech/utils/score.py:5: in
import editdistance
E ImportError: No module named editdistance
_________________________________________ ERROR collecting seq2seq_test.py _________________________________________
ImportError while importing test module '/imatge/sroca/speech/tests/seq2seq_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
seq2seq_test.py:6: in
from speech.models import Seq2Seq
../../pytorch/speech/init.py:2: in
from speech.utils.score import compute_cer
../../pytorch/speech/utils/score.py:5: in
import editdistance
E ImportError: No module named editdistance
__________________________________________ ERROR collecting wave_test.py ___________________________________________
ImportError while importing test module '/imatge/sroca/speech/tests/wave_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
wave_test.py:4: in
import speech.utils.wave as wave
../../pytorch/speech/init.py:2: in
from speech.utils.score import compute_cer
../../pytorch/speech/utils/score.py:5: in
import editdistance
E ImportError: No module named editdistance
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 6 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================= 6 error in 0.76 seconds ==============================================
``

object has no attribute 'is_cuda' in python 3.6

Error message:

AttributeError: 'Seq2Seq' object has no attribute 'is_cuda'

../../../../anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py:262: AttributeError

Out of memory Error

make error : system cannot find the path specified

Hi thanks for your work!

I am running a windows machine &I get the error below when I run 'MinGW32-make'

git clone https://github.com/awni/warp-ctc.git libs/warp-ctc
Cloning into 'libs/warp-ctc'...
remote: Enumerating objects: 467, done.
Receiving objects:  90% (421/467), remote: Total 467 (delta 0), reused 0 (delta 0), pack-reused 467
Receiving objects: 100% (467/467), 334.09 KiB | 221.00 KiB/s, done.
Resolving deltas: 100% (222/222), done.
cd libs/warp-ctc; mkdir build; cd build; cmake ../ && make; \
        cd ../pytorch_binding; python build.py
The system cannot find the path specified.
Makefile:5: recipe for target 'warp' failed
MinGW32-make: *** [warp] Error 1

I have the wrap-ctc repo cloned but the rest gives me error. I manually created the build folder and ran makefile from there but failed.
Any way to make this work? how to run makefile manually?

Thanks,

model error

ImportError: No module named speech

Transducer: zip object is not subscriptable

Whenver I run any model it works fine, however running transducer gives the error in transducer_model.py

Zip object is not subscriptable, line 48

So what I do is try to cast it as list, then I get

batch[1] is out of range

I tried to manually print the contents of "batch", and for some reason they empty themselves after being printed out.

Example:
print(*batch)
->
[...][...]
print(*batch)' -> ` (nothing gets printed)

Multi Gpu Training support

Please add multi gpu training support to the code

Make error

[ 28%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o
[ 28%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_ctc_entrypoint.cu.o
/data/f3v1/v-yuewng/gitclone/speech/libs/warp-ctc/src/ctc_entrypoint.cu(1): error: this declaration has no storage class or type specifier

/data/f3v1/v-yuewng/gitclone/speech/libs/warp-ctc/src/ctc_entrypoint.cu(1): error: expected a ";"

2 errors detected in the compilation of "/tmp/tmpxft_00000943_00000000-12_ctc_entrypoint.compute_62.cpp1.ii".
CMake Error at warpctc_generated_ctc_entrypoint.cu.o.cmake:266 (message):
Error generating file
/data/f3v1/v-yuewng/gitclone/speech/libs/warp-ctc/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_ctc_entrypoint.cu.o

CMakeFiles/warpctc.dir/build.make:63: recipe for target 'CMakeFiles/warpctc.dir/src/warpctc_generated_ctc_entrypoint.cu.o' failed
make[3]: *** [CMakeFiles/warpctc.dir/src/warpctc_generated_ctc_entrypoint.cu.o] Error 1
make[3]: *** Waiting for unfinished jobs....
/data/f3v1/v-yuewng/gitclone/speech/libs/warp-ctc/src/reduce.cu(44): warning: function "__shfl_down(float, unsigned int, int)"
/usr/local/cuda/include/sm_30_intrinsics.hpp(278): here was declared deprecated ("__shfl_down() is deprecated in favor of __shfl_down_sync() and may be removed in a future release (Use -Wno-deprecated-declarations to suppress this warning).")
detected during:
instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::add<float, float>]"
(76): here
instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::negate<float, float>, Rop=ctc_helper::add<float, float>, T=float]"
(124): here
instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(139): here
instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(149): here

/data/f3v1/v-yuewng/gitclone/speech/libs/warp-ctc/src/reduce.cu(44): warning: function "__shfl_down(float, unsigned int, int)"
/usr/local/cuda/include/sm_30_intrinsics.hpp(278): here was declared deprecated ("__shfl_down() is deprecated in favor of __shfl_down_sync() and may be removed in a future release (Use -Wno-deprecated-declarations to suppress this warning).")
detected during:
instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::maximum<float, float>]"
(76): here
instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::identity<float, float>, Rop=ctc_helper::maximum<float, float>, T=float]"
(124): here
instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(139): here
instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(157): here

/data/f3v1/v-yuewng/gitclone/speech/libs/warp-ctc/src/reduce.cu(44): warning: function "__shfl_down(float, unsigned int, int)"
/usr/local/cuda/include/sm_30_intrinsics.hpp(278): here was declared deprecated ("__shfl_down() is deprecated in favor of __shfl_down_sync() and may be removed in a future release (Use -Wno-deprecated-declarations to suppress this warning).")
detected during:
instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::add<float, float>]"
(76): here
instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::negate<float, float>, Rop=ctc_helper::add<float, float>, T=float]"
(124): here
instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(139): here
instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(149): here