sooftware / kospeech Goto Github PK

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.

Home Page: https://sooftware.github.io/kospeech/

License: Apache License 2.0

Python 99.50% Shell 0.50%

asr attention-is-all-you-need conformer e2e-asr end-to-end jasper korean-speech ksponspeech las las-models pytorch seq2seq speech-recognition transformer

kospeech's Introduction

I'm Soohwan Kim

Career

Co-founder & A.I. team leader at TUNiB 2021.03 ~ present
A.I. Engineer at Kakao Brain 2020.08 ~ 2021.03

Service

Dearmate : A.I. SNS platform with a variety of characters with unique personas
TUNiBridge : NLP Cloud API Services

kospeech's People

Contributors

Stargazers

Watchers

Forkers

jeongwonkwak gitter-badger qute012 wch18735 ai-natural-language-processing-lab gmhwang 0cys naxda xieliang555 jeriousman mystlee hiyoung-asr eunjung31 triplet02 twistedmove mrbananahuman hephaex dongpil rheehot lkykor77 mnbv7581 wonwizard kldami teosoft7 cryingmiso peda007 jokecorleone mujjingun rickyhong kimsehyung softwareimpacts sdb016 elbum 21jun pjhool upskyy arfitech shleee47 rxhmdia yeohoonyun scpark20 jerrian kimjj-geek jungwook518 5iding yagyapandeya zelabean zeta1999 hyuntae-yun kaki-ai tongning tngns1129 hogil2 andrey1362010 git-bjpark brstar96 cybertrust1 henrys-lab szalata songys sangkyunjo hwiorn soonge jackie-wx 91khw beijinggao kenzed shiyuzh2007 hikim0734 leaderofj jung-youjin dongso arabae dobbytk macschool namgyupark22 gywlsdms123 nitimur-in-vetitum zchristian955 eunjiiiii nakhunchumpolsathien gemiso-shkim ramiebeid moonbc barleysack daiyaanarfeen hyunahhhhhh icodein wonwooo casebell jacknkong mbinguny machari techthiyanes chenhuayou achyun factorlee sookheart cwpdntm yongseong81

kospeech's Issues

train 시 epoch이 넘어갈때 out of memory error

안녕하세요!
최신 코드로 train 시도를 하고 있는데 다음 epoch으로 넘어갈때마다 out of memory error가 뜹니다.
혹시 제가 개선할 수 있는 방안이 있을까요?
사용하고 있는 GPU는 RTX 2080Ti 2개에 RAM 256GB입니다!

Traceback (most recent call last):
 File "./main.py", line 113, in <module>
   main()
 File "./main.py", line 109, in main
   train(opt)
 File "./main.py", line 87, in train
   resume=opt.resume
 File "../kospeech/trainer/supervised_trainer.py", line 147, in train
   train_queue, teacher_forcing_ratio)
 File "../kospeech/trainer/supervised_trainer.py", line 230, in __train_epoches
   targets=targets, teacher_forcing_ratio=teacher_forcing_ratio)
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/modules/module.
py", line 550, in __call__
   result = self.forward(*input, **kwargs)
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/data_p
arallel.py", line 155, in forward
   outputs = self.parallel_apply(replicas, inputs, kwargs)
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/data_p
arallel.py", line 165, in parallel_apply
   return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replica
s)])
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/parall
el_apply.py", line 85, in parallel_apply
   output.reraise()
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/_utils.py", line 3
95, in reraise
   raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/parall
el_apply.py", line 60, in _worker
   output = module(*input, **kwargs)
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/modules/module.
py", line 550, in __call__
   result = self.forward(*input, **kwargs)
 File "../kospeech/models/acoustic/seq2seq/seq2seq.py", line 51, in forward
   result = self.decoder(targets, output, teacher_forcing_ratio, return_decode_
dict)
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/modules/module.
py", line 550, in __call__
   result = self.forward(*input, **kwargs)
 File "../kospeech/models/acoustic/seq2seq/decoder.py", line 175, in forward
   step_output, hidden, attn = self.forward_step(input_var, hidden, encoder_out
puts, attn)
 File "../kospeech/models/acoustic/seq2seq/decoder.py", line 121, in forward_st
ep
   output, hidden = self.rnn(embedded, hidden)
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/modules/module.
py", line 550, in __call__
   result = self.forward(*input, **kwargs)
 File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py"
, line 570, in forward
   self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: CUDA out of memory. Tried to allocate 66.00 MiB (GPU 1; 10.76 GiB
total capacity; 9.39 GiB already allocated; 54.44 MiB free; 9.82 GiB reserved in
total by PyTorch)

[Problem Fix] Gap between training CER and test CER

Check Fixed Code Here : models.speller.forward()

Hi! I'm Seyoung Bae, from team Kai.Lib.

We got some reports that our project have some issues with the gap between training CER and test CER.

We struggled with our code, and found what we missed.

In model testing, Teacher Forcing ratio is 0.0(off).
In this case, RNN cell needs previous timestep's hidden state(h_t-1).
But h_t-1 hidden state wasn't returned during forward() function.

So we fixed this problems via returning previous timestep's hidden state.
Therefore, please re-download our project repository instead of former version.

Thank you!

Check for the details below:

def _forward_step(self, input, hidden, listener_outputs=None, function=F.log_softmax):
    """ forward one time step """
    batch_size = input.size(0)
    output_size = input.size(1)

    embedded = self.embedding(input).to(self.device)
    embedded = self.input_dropout(embedded)

    if self.training:
        self.rnn.flatten_parameters()

    output, hidden = self.rnn(embedded, hidden)

    if self.use_attention:
        output = self.attention(output, listener_outputs)
    else:
        output = output

    output = self.w(output.contiguous().view(-1, self.hidden_size))
    predicted_softmax = function(output, dim=1).view(batch_size, output_size, -1)

    return predicted_softmax, hidden

def forward(self, inputs, listener_outputs, function=F.log_softmax, teacher_forcing_ratio=0.90, use_beam_search=False):
    decode_results = list()
    batch_size = inputs.size(0)
    max_len = inputs.size(1) - 1  # minus the start of sequence symbol
    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    speller_hidden = torch.zeros(self.n_layers, batch_size, self.hidden_size)

학습시 GPU 메모리크기

어떤 스펙의 GPU를 사용해서 학습하셨는지 알 수 있을까요? 현재 2080ti 모델로 학습시에 배치크기를 8로 설정해도 OOM 에러가 발생합니다.

eval.py 진행 시 lm_path error

안녕하세요. 우선 이렇게 좋은 코드와 모델을 만들어주셔서 감사합니다!!
코드도 너무 깔끔해서 많이 배우고 있습니다!

aihub 데이터 그대로 다운받아서 toy data만 train, evaluation해보고 있는데
eval.py에서 아래와 같은 에러가 나옵니다ㅠ
진행시에는 colab 환경에서 했고 path와 batch size만 변경하였습니다.
어떤게 문제인지 알 수 있을까요?ㅠ

[2020-06-19 03:36:36,586 utils.py:21 - info()] --mode: eval
[2020-06-19 03:36:36,586 utils.py:21 - info()] --sample_rate: 16000
[2020-06-19 03:36:36,587 utils.py:21 - info()] --window_size: 20
[2020-06-19 03:36:36,587 utils.py:21 - info()] --stride: 10
[2020-06-19 03:36:36,587 utils.py:21 - info()] --n_mels: 80
[2020-06-19 03:36:36,587 utils.py:21 - info()] --normalize: True
[2020-06-19 03:36:36,587 utils.py:21 - info()] --del_silence: True
[2020-06-19 03:36:36,587 utils.py:21 - info()] --input_reverse: True
[2020-06-19 03:36:36,587 utils.py:21 - info()] --feature_extract_by: librosa
[2020-06-19 03:36:36,587 utils.py:21 - info()] --time_mask_para: 50
[2020-06-19 03:36:36,587 utils.py:21 - info()] --freq_mask_para: 12
[2020-06-19 03:36:36,587 utils.py:21 - info()] --time_mask_num: 2
[2020-06-19 03:36:36,588 utils.py:21 - info()] --freq_mask_num: 2
[2020-06-19 03:36:36,588 utils.py:21 - info()] --dataset_path: ../../DATA/KsponSpeech_01/KsponSpeech_0001/
[2020-06-19 03:36:36,588 utils.py:21 - info()] --data_list_path: ../data/data_list/toy_test_list.csv
[2020-06-19 03:36:36,588 utils.py:21 - info()] --label_path: ./data/label/aihub_labels.csv
[2020-06-19 03:36:36,588 utils.py:21 - info()] --num_workers: 4
[2020-06-19 03:36:36,588 utils.py:21 - info()] --use_cuda: True
[2020-06-19 03:36:36,588 utils.py:21 - info()] --model_path: ../data/checkpoint/checkpoints/2020_06_18_08_42_49/model.pt
[2020-06-19 03:36:36,588 utils.py:21 - info()] --batch_size: 8
[2020-06-19 03:36:36,588 utils.py:21 - info()] --decode: greedy
[2020-06-19 03:36:36,588 utils.py:21 - info()] --k: 5
[2020-06-19 03:36:36,588 utils.py:21 - info()] --print_every: 10
[2020-06-19 03:36:36,638 utils.py:21 - info()] Operating System : Linux 4.19.104+
[2020-06-19 03:36:36,639 utils.py:21 - info()] Processor : x86_64
[2020-06-19 03:36:36,644 utils.py:21 - info()] device : Tesla K80
[2020-06-19 03:36:36,644 utils.py:21 - info()] CUDA is available : True
[2020-06-19 03:36:36,644 utils.py:21 - info()] CUDA version : 10.1
[2020-06-19 03:36:36,644 utils.py:21 - info()] PyTorch version : 1.5.0+cu101
[2020-06-19 03:36:56,738 utils.py:141 - _init_num_threads()] NumExpr defaulting to 2 threads.
100% 167/167 [03:21<00:00,  1.21s/it]
Traceback (most recent call last):
  File "./eval.py", line 66, in <module>
    main()
  File "./eval.py", line 62, in main
    inference(opt)
  File "./eval.py", line 41, in inference
    evaluator = Evaluator(testset, opt.batch_size, device, opt.num_workers, opt.print_every, opt.decode, opt.k)
  File "../kospeech/evaluator/evaluator.py", line 28, in __init__
    self.decoder = GreedySearch()
  File "../kospeech/decode/search.py", line 21, in __init__
    self.language_model = load_language_model('lm_path', 'cuda')
  File "../kospeech/model_builder.py", line 130, in load_language_model
    model = torch.load(path, map_location=lambda storage, loc: storage).to(device)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 584, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 234, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 215, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'lm_path'

Pre-processed Transcripts (authenticate )

If you authenticate that you have access to that dataset, we will be sharing a link
Mail to [email protected]

AttributeError: 'tuple' object has no attribute 'size'

아래 오류가 발생합니다..
타입이 안맞는거 같아요~~

[2020-03-11 08:40:53,792 train.py:121 - ()] start
Traceback (most recent call last):
File "train.py", line 142, in
teacher_forcing_ratio = hparams.teacher_forcing
File "/schwang/stt_kai/Korean-Speech-Recognition_20200310/package/trainer.py", line 58, in supervised_train
y_hat, logit = model(feats, targets, teacher_forcing_ratio=teacher_forcing_ratio)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/schwang/stt_kai/Korean-Speech-Recognition_20200310/models/listenAttendSpell.py", line 42, in forward
listener_outputs = self.listener(feats)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/schwang/stt_kai/Korean-Speech-Recognition_20200310/models/listener.py", line 175, in forward
middle_output = self.middle_rnn(bottom_output)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/schwang/stt_kai/Korean-Speech-Recognition_20200310/models/listener.py", line 50, in forward
batch_size = inputs.size(0)
AttributeError: 'tuple' object has no attribute 'size'

[Train Error] invalid shape, thread 에러

안녕하세요, 음성인식을 스터디 하다가 알게 되어 많은 참고를 하고 있습니다.
먼저 좋은 정보를 깔끔하게 정리해 주셔서 많은 도움이 됐습니다.

공유해주신 코드로 학습을 진행하려고 하니 아래와 같은 에러가 발생하고 진행이 되지 않는 상태로 프로세스가 떠있습니다.
기동 환경은 ubuntu OS이고 single GPU입니다.
데이터 전처리 부분은 코드를 참고하여 경로를 일부 변경해서 처리하였고, opts.py에 변경한 파일로 설정해주었습니다.

크게 두 가지 에러가 있는데,

Invalid shape 에러 - RuntimeError: shape '[32, -1]' is invalid for input of size 2860
데이터셋은 AIHUB 데이터를 사용하였고, option 값은 opts.py의 default를 참조하도록 되어 있습니다.
여기 설정에서 놓친게 있을까요?

2.Thread 에러 - Exception ignored in: <module 'threading' from '/home/mchoe/.conda/envs/sr/lib/python3.6/threading.py'>
해당 에러는 코드를 강제 종료하면 발생하는데 thread 설정을 따로 해주어야 하는지요?
코드상에서는 data_loader.py에서 Threading을 쓰는데 이와 관련이 있을까요?

혹시 아시는 사항이 있어 답변해주시면 정말 감사하겠습니다.

[2020-07-20 17:51:47,573 utils.py:21 - info()] Operating System : Linux 5.3.0-61-generic
[2020-07-20 17:51:47,573 utils.py:21 - info()] Processor : x86_64
[2020-07-20 17:51:47,574 utils.py:21 - info()] device : GeForce GTX 1080 Ti
[2020-07-20 17:51:47,574 utils.py:21 - info()] CUDA is available : True
[2020-07-20 17:51:47,574 utils.py:21 - info()] CUDA version : 10.2
[2020-07-20 17:51:47,574 utils.py:21 - info()] PyTorch version : 1.5.1
100%|██████████| 497658/497658 [00:09<00:00, 53032.36it/s]
[2020-07-20 17:51:58,358 utils.py:21 - info()] split dataset start !!
[2020-07-20 17:52:00,376 utils.py:21 - info()] split dataset complete !!
[2020-07-20 17:52:01,611 utils.py:21 - info()] start
[2020-07-20 17:52:01,611 utils.py:21 - info()] Epoch 0 start
Traceback (most recent call last):
File "/home/mchoe/PycharmProjects/sr2/bin/main.py", line 99, in
main()
File "/home/mchoe/PycharmProjects/sr2/bin/main.py", line 95, in main
train(opt)
File "/home/mchoe/PycharmProjects/sr2/bin/main.py", line 74, in train
num_epochs=opt.num_epochs, teacher_forcing_ratio=opt.teacher_forcing_ratio, resume=opt.resume)
File "/home/mchoe/PycharmProjects/sr2/kospeech/trainer/supervised_trainer.py", line 104, in train
train_queue, teacher_forcing_ratio)
File "/home/mchoe/PycharmProjects/sr2/kospeech/trainer/supervised_trainer.py", line 188, in train_epoches
logit = model(inputs, input_lengths, targets, teacher_forcing_ratio=teacher_forcing_ratio)
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/mchoe/PycharmProjects/sr2/kospeech/models/seq2seq/seq2seq.py", line 37, in forward
result = self.decoder(targets, output, teacher_forcing_ratio, language_model)
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/mchoe/PycharmProjects/sr2/kospeech/models/seq2seq/decoder.py", line 127, in forward
inputs = inputs[inputs != self.eos_id].view(batch_size, -1)
RuntimeError: shape '[32, -1]' is invalid for input of size 2860
Exception ignored in: <module 'threading' from '/home/mchoe/.conda/envs/sr/lib/python3.6/threading.py'>
Traceback (most recent call last):
File "/home/mchoe/.conda/envs/sr/lib/python3.6/threading.py", line 1294, in _shutdown
t.join()
File "/home/mchoe/.conda/envs/sr/lib/python3.6/threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "/home/mchoe/.conda/envs/sr/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt

Process finished with exit code 1

Running main.py during train

안녕하세요, 코드를 돌리다 막히는 부분이 있어 이슈에 올려요.

저번 이슈에서 말씀하신 부분 듣고 경로 설정 및 이러저러한 과정을 거쳤는데요.

학습 과정의 main.py를 실행하는 과정에서 다음과 같은 이슈가 발생했습니다.

아마 audio file 경로를 설정해주지 않아서인 듯 한데...

run_seq2seq.sh 파일에서는 전처리되어 transcript된 파일 경로만 설정하더군요.

audio file 경로는 어떻게 설정해주면 될까요?

아래는 관련 이슈 캡처입니다.

항상 감사드립니다.

Key Error

Traceback (most recent call last):
File "train.py", line 119, in
target_dict = target_dict,
File "C:\Users\nextgen\Desktop\Korean-Speech-Recognition-master\package\dataset.py", line 187, in split_dataset
pack_by_length=hparams.pack_by_length
File "C:\Users\nextgen\Desktop\Korean-Speech-Recognition-master\package\dataset.py", line 41, in init
self.sort_by_length()
File "C:\Users\nextgen\Desktop\Korean-Speech-Recognition-master\package\dataset.py", line 88, in sort_by_length
target_lengths.append(len(self.target_dict[key].split()))
KeyError: 'KaiSpeech_label_088282'

Traceback (most recent call last):
File "train.py", line 119, in
target_dict = target_dict,
File "C:\Users\nextgen\Desktop\Korean-Speech-Recognition-master\package\dataset.py", line 187, in split_dataset
pack_by_length=hparams.pack_by_length
File "C:\Users\nextgen\Desktop\Korean-Speech-Recognition-master\package\dataset.py", line 41, in init
self.sort_by_length()
File "C:\Users\nextgen\Desktop\Korean-Speech-Recognition-master\package\dataset.py", line 88, in sort_by_length
target_lengths.append(len(self.target_dict[key].split()))
KeyError: 'KaiSpeech_label_000038'

전처리를 모두 해주고 (labeling작업) train.py를 실행하면 Key가 없다고 오류가 나네요
제가 잘못한것인지 확인을 위해서 define.py의 경로를 sample로 바꿔주고
sample data를 이용하여 해봐도 같은 오류가 나타났습니다.
loader.py 의 load_targets을 이용해서 target_dict을 만들어서 key error가 발생한 key를 넣어보면
label이 잘 나오는 것을 확인 했습니다.
dataset.py의 sort_by_length 함수에서 key = label_path.split('/')[-1].split('.')[0]로 확인을 해봐도
KeyError가 나타난 키가 나오는데 어떻게 하면 좋을까요

attention.py에 ScaledDotProductAttention이 없어요ㅠ

안녕하세요.
지난주 보내주신 pre-trained weight 잘 받았습니다. 감사합니다!

보내주신 weight로 eval.py를 돌려봤는데
아래와 같은 에러가 나는데 확인해보니 attention.py에 ScaledDotProductAttention가 없더라구요ㅠ
추가해주시거나 제가 수정을 할 수 방안을 알려주실 수 있는지 문의드립니다!

[2020-06-21 02:51:36,690 utils.py:21 - info()] --mode: eval
[2020-06-21 02:51:36,690 utils.py:21 - info()] --sample_rate: 16000
[2020-06-21 02:51:36,690 utils.py:21 - info()] --window_size: 20
[2020-06-21 02:51:36,690 utils.py:21 - info()] --stride: 10
[2020-06-21 02:51:36,690 utils.py:21 - info()] --n_mels: 80
[2020-06-21 02:51:36,690 utils.py:21 - info()] --normalize: True
[2020-06-21 02:51:36,690 utils.py:21 - info()] --del_silence: True
[2020-06-21 02:51:36,690 utils.py:21 - info()] --input_reverse: True
[2020-06-21 02:51:36,690 utils.py:21 - info()] --feature_extract_by: librosa
[2020-06-21 02:51:36,690 utils.py:21 - info()] --time_mask_para: 50
[2020-06-21 02:51:36,691 utils.py:21 - info()] --freq_mask_para: 12
[2020-06-21 02:51:36,691 utils.py:21 - info()] --time_mask_num: 2
[2020-06-21 02:51:36,691 utils.py:21 - info()] --freq_mask_num: 2
[2020-06-21 02:51:36,691 utils.py:21 - info()] --dataset_path: ../../DATA/KsponSpeech_01/KsponSpeech_0001/
[2020-06-21 02:51:36,691 utils.py:21 - info()] --data_list_path: ../data/data_list/toy_test_list.csv
[2020-06-21 02:51:36,691 utils.py:21 - info()] --label_path: ./data/label/aihub_labels.csv
[2020-06-21 02:51:36,691 utils.py:21 - info()] --num_workers: 4
[2020-06-21 02:51:36,691 utils.py:21 - info()] --use_cuda: True
[2020-06-21 02:51:36,691 utils.py:21 - info()] --model_path: ../data/checkpoints/model.pt
[2020-06-21 02:51:36,691 utils.py:21 - info()] --batch_size: 8
[2020-06-21 02:51:36,691 utils.py:21 - info()] --decode: greedy
[2020-06-21 02:51:36,691 utils.py:21 - info()] --k: 5
[2020-06-21 02:51:36,691 utils.py:21 - info()] --print_every: 10
[2020-06-21 02:51:36,727 utils.py:21 - info()] Operating System : Linux 4.19.104+
[2020-06-21 02:51:36,727 utils.py:21 - info()] Processor : x86_64
[2020-06-21 02:51:36,729 utils.py:21 - info()] device : Tesla T4
[2020-06-21 02:51:36,729 utils.py:21 - info()] CUDA is available : True
[2020-06-21 02:51:36,729 utils.py:21 - info()] CUDA version : 10.1
[2020-06-21 02:51:36,729 utils.py:21 - info()] PyTorch version : 1.5.0+cu101
Traceback (most recent call last): File "./eval.py", line 66, in <module> main() File "./eval.py", line 62, in main inference(opt) File "./eval.py", line 25, in inference model = load_test_model(opt, device) File "../kospeech/model_builder.py", line 116, in load_test_model model = torch.load(opt.model_path, map_location=lambda storage, loc: storage).to(device) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 593, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 773, in _legacy_load result = unpickler.load()
AttributeError: Can't get attribute 'ScaledDotProductAttention' on <module 'kospeech.model.attention' from '../kospeech/model/attention.py'>

multi gpu에서 training시 오류가 발생합니다.

안녕하세요

딥러닝 시작하는 초보입니다.

어제(2020-07-13) git 업데이트 후 single gpu로 training은 정상적으로 진행되지만,
multi gpu로 training시 오류가 발생하였습니다.

실행은 ./run.sh의 기본 옵션으로 training을 진행하였습니다.

디버그 코드를 몇몇 찍어서 확인해봤는데
kospeech.models.seq2seq.decoder.py의 forward 까지는 오류가 없었으며,
이후에 오류가 발생하는것으로 보여집니다.

아래 오류 메시지를 전달드립니다.

감사합니다.

오류 메시지
File "./main.py", line 110, in
main()
File "./main.py", line 106, in main
train(opt)
File "./main.py", line 85, in train
checkpoint_path=opt.checkpoint_path
File "../kospeech/trainer/supervised_trainer.py", line 105, in train
train_queue, teacher_forcing_ratio, checkpoint_path)
File "../kospeech/trainer/supervised_trainer.py", line 194, in train_epoches
targets=scripts, teacher_forcing_ratio=teacher_forcing_ratio)[0]
File "/home/ubuntu/anaconda3/envs/kospeech2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/kospeech2/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 156, in forward
return self.gather(outputs, self.output_device)
File "/home/ubuntu/anaconda3/envs/kospeech2/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/ubuntu/anaconda3/envs/kospeech2/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
res = gather_map(outputs)
File "/home/ubuntu/anaconda3/envs/kospeech2/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/ubuntu/anaconda3/envs/kospeech2/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
for k in out))
File "/home/ubuntu/anaconda3/envs/kospeech2/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in
for k in out))
File "/home/ubuntu/anaconda3/envs/kospeech2/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
TypeError: expected sequence object with len >= 0 or a single integer

ModuleNotFoundError: No module named 'kospeech.models.acoustic'

안녕하세요 README에 링크된 pretrained model 로 infer.sh 파일을 실행하면 해당 에러가 발생합니다.
kospeech.models 폴더 내에 acoustic 관련 파일이 업로드 되지 않아서 그런 게 아닌지 문의드립니다.
시간이 되신다면 한번 확인해주시면 감사하겠습니다.

아래는 에러 로그입니다.
Traceback (most recent call last):
File "run_pretrain.py", line 33, in
model = load_test_model(opt, opt.device)
File "../kospeech/model_builder.py", line 201, in load_test_model
model = torch.load(opt.model_path, map_location=lambda storage, loc: storage).to(device)
File "/KoSpeech/venv/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/KoSpeech/venv/lib/python3.7/site-packages/torch/serialization.py", line 702, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named 'kospeech.models.acoustic'

pre-trained 모델을 불러올때 에러가 발생합니다.

안녕하세요.
먼저 이렇게 좋은 자료들을 깃허브에 올려주셔서 감사합니다.

data폴더에 있는 0에폭 16000step의 weight를 불러 와볼려고 합니다.

환경은 아나콘다로 파이썬3.7 버전으로 진행했으며, 필요한 라이브러리는 requirements.txt로 설치했습니다.

먼저 train.py로 학습 실행되는것은 확인했습니다(OOM으로 중단 상태입니다.)

test.py에서 코드는 테스트 데이터셋 경로만 수정했으며, 실행하면 아래와 같은 에러가 발생합니다.

(KASR) sysadm@son:~/Desktop/data/ASR/End-to-End-Korean-Speech-Recognition$ python test.py [2020-04-16 18:23:18,883 test.py:67 - <module>()] device : TITAN Xp [2020-04-16 18:23:18,883 test.py:68 - <module>()] CUDA is available : True [2020-04-16 18:23:18,883 test.py:69 - <module>()] CUDA version : 10.1 [2020-04-16 18:23:18,883 test.py:70 - <module>()] PyTorch version : 1.4.0 [2020-04-16 18:23:18,883 config.py:87 - print_log()] use_bidirectional : True [2020-04-16 18:23:18,883 config.py:88 - print_log()] use_attention : True [2020-04-16 18:23:18,883 config.py:89 - print_log()] use_pickle : False [2020-04-16 18:23:18,883 config.py:90 - print_log()] use_augment : True [2020-04-16 18:23:18,883 config.py:91 - print_log()] use_pyramidal : True [2020-04-16 18:23:18,883 config.py:92 - print_log()] augment_ratio : 1.00 [2020-04-16 18:23:18,883 config.py:93 - print_log()] input_reverse : True [2020-04-16 18:23:18,883 config.py:94 - print_log()] hidden_size : 256 [2020-04-16 18:23:18,883 config.py:95 - print_log()] listener_layer_size : 5 [2020-04-16 18:23:18,883 config.py:96 - print_log()] speller_layer_size : 3 [2020-04-16 18:23:18,883 config.py:97 - print_log()] dropout : 0.50 [2020-04-16 18:23:18,883 config.py:98 - print_log()] batch_size : 32 [2020-04-16 18:23:18,883 config.py:99 - print_log()] worker_num : 1 [2020-04-16 18:23:18,883 config.py:100 - print_log()] max_epochs : 40 [2020-04-16 18:23:18,883 config.py:101 - print_log()] initial learning rate : 0.0001 [2020-04-16 18:23:18,883 config.py:105 - print_log()] teacher_forcing_ratio : 0.90 [2020-04-16 18:23:18,883 config.py:106 - print_log()] seed : 1 [2020-04-16 18:23:18,883 config.py:107 - print_log()] max_len : 151 [2020-04-16 18:23:18,883 config.py:108 - print_log()] use_cuda : True /home/sysadm/anaconda3/envs/KASR/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'models.listenAttendSpell.ListenAttendSpell' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = Trueand use the patch tool to revert the changes. warnings.warn(msg, SourceChangeWarning) /home/sysadm/anaconda3/envs/KASR/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'models.listener.Listener' has changed. you can retrieve the original source code by accessing the object's source attribute or settorch.nn.Module.dump_patches = Trueand use the patch tool to revert the changes. warnings.warn(msg, SourceChangeWarning) /home/sysadm/anaconda3/envs/KASR/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'models.speller.Speller' has changed. you can retrieve the original source code by accessing the object's source attribute or settorch.nn.Module.dump_patches = Trueand use the patch tool to revert the changes. warnings.warn(msg, SourceChangeWarning) /home/sysadm/anaconda3/envs/KASR/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'models.attention.MultiHeadAttention' has changed. you can retrieve the original source code by accessing the object's source attribute or settorch.nn.Module.dump_patches = True and use the patch tool to revert the changes. warnings.warn(msg, SourceChangeWarning) Traceback (most recent call last): File "test.py", line 102, in <module> model.load_state_dict(load_model.state_dict()) File "/home/sysadm/anaconda3/envs/KASR/lib/python3.7/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for ListenAttendSpell: Missing key(s) in state_dict: "speller.out.weight", "speller.out.bias", "speller.attention.out.weight", "speller.attention.out.bias". Unexpected key(s) in state_dict: "speller.w.weight", "speller.w.bias", "speller.attention.linear_out.weight", "speller.attention.linear_out.bias". (KASR) sysadm@son:~/Desktop/data/ASR/End-to-End-Korean-Speech-Recognition$

평소 텐서플로는 써봤지만, 파이토치는 처음 써보는거라, 제가 모르는 부분이 있을 수도 있습니다 :)

추가로 저희는 배치 사이즈를 8로 해도, gpu OOM이 발생하던데, 학습 돌리시는 환경과 배치 사이즈가 궁금합니다...

코드 올려주셔서 감사하고, 코로나 조심하세요.

Transcript 포맷 변경

Transcript 파일 포맷을 변경하고, 그에 따른 KoSpeech 코드 변경이 있었습니다.
KsponSpeech 다운 허가를 받으신 분들은 해당 이슈에 코멘트 남겨주시면 transcript.txt 파일을 보내드리겠습니다. (전처리 필요 X)

Pretrain Model

현재 공개된 Pre-train 모델은 현재 폴더 구조에서 1에폭만 학습된 모델입니다.
이후 추가 학습이 되는대로 업데이트하여 공개하겠습니다.

epoch이 한 번 밖에 진행되지 않아요

[2020-07-03 23:51:30,154 utils.py:21 - info()] Epoch 0 Training result saved as a csv file complete !!

로컬에서 코드 돌려봤는데 이상없이 실행은 되는데 첫번째 epoch이후에 2번째로 넘어가질 않고 위에 코드에서 계속 멈춰있는 상황입니다
왜 이런 상황이 나타나는지 알 수 있을까요?
그리고 초반에는 train CER val CER의 차이가 있는게 정상인가요?

빔서치 구현 오류

현재 빔서치 구현에 오류가 있어 팀원들과 수정중입니다.
수정되는대로 코드 수정해놓겠습니다.

There is an error in the implementation of beam search, so I am modifying it with my team members.
I will correct the code as soon as it is modified.

빔서치 디코딩 에러

빔서치 사용해서 모델평가를 진행하면 이러한 에러가 발생합니다. 원인이 무엇인지 알 수 있을까요?
실행옵션은 decode 옵션을 greedy -> beam 으로만 바꿔주어서 그대로 실행하였습니다

CUDA device assert error 가 발생했어요.도와주세요.

[2020-03-05 16:44:26,274 trainer.py:84 - train()] timestep: 678/150236, loss: 0.3271, cer: 1.00, elapsed: 1.16s 9.71m 0.16h
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered (record at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:116)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f0fa49c3193 in /root/anaconda3/envs/pytorch3.5/lib/python3.5/site-packages/torch/lib/libc10.so)

When learning with Transformer, loss becomes nan after backpropagation.

Currently, Seq2seq and Transformer have two models implemented, and after backpropagation when learning with Transformer, the phenomenon of loss becoming nan continues. I have tried debugging, but I have not yet confirmed which part is wrong. If you have had a similar experience or have any guesses, I would appreciate it if you could help me.

Add Transformer Model

I add transformer model at kospeech/models/transformer/.
Although model was added, it has not been applied for training and evaluation yet. I will apply as soon as possible.

[Train Error] return_decode_dict 에러

안녕하세요, 공유해주신 코드로 train 중에 아래와 같은 에러가 발생하여 확인 요청드립니다.

train 모드에서 첫 번째 에폭의 train까지는 완료가 되었고 validation에서 에러가 발생하였습니다.
에러는 return_decode_dict가 forward()에서 unexpected argument라는 것인데 이전에 코드에서는 없던 argument로 보입니다.
현재 기동한 코드는 금일 새롭게 clone 하였습니다.
기동 모델은 seq2seq인데 혹시 추가로 설정해야 할 것이 있을까요?

에러 발생 코드 - supervised_trainer.py
if self.architecture == 'seq2seq':
model.module.flatten_parameters()

                output = model(inputs=inputs, input_lengths=input_lengths,
                               teacher_forcing_ratio=0.0,
                               language_model=None, return_decode_dict=False)

에러 메시지

Traceback (most recent call last):
File "./main.py", line 92, in main
train(opt)
File "./main.py", line 71, in train
num_epochs=opt.num_epochs, teacher_forcing_ratio=opt.teacher_forcing_ratio, resume=opt.resume)
File "../kospeech/trainer/supervised_trainer.py", line 151, in train
valid_cer = self.validate(model, valid_queue)
File "../kospeech/trainer/supervised_trainer.py", line 303, in validate
language_model=None, return_decode_dict=False)
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'return_decode_dict'

전체 로그
[2020-07-28 17:05:02,639 utils.py:21 - info()] --mode: train
[2020-07-28 17:05:02,639 utils.py:21 - info()] --transform_method: mel
[2020-07-28 17:05:02,639 utils.py:21 - info()] --sample_rate: 16000
[2020-07-28 17:05:02,639 utils.py:21 - info()] --frame_length: 20
[2020-07-28 17:05:02,639 utils.py:21 - info()] --frame_shift: 10
[2020-07-28 17:05:02,639 utils.py:21 - info()] --n_mels: 80
[2020-07-28 17:05:02,639 utils.py:21 - info()] --normalize: True
[2020-07-28 17:05:02,639 utils.py:21 - info()] --del_silence: True
[2020-07-28 17:05:02,639 utils.py:21 - info()] --input_reverse: False
[2020-07-28 17:05:02,639 utils.py:21 - info()] --feature_extract_by: torchaudio
[2020-07-28 17:05:02,639 utils.py:21 - info()] --time_mask_para: 40
[2020-07-28 17:05:02,639 utils.py:21 - info()] --freq_mask_para: 12
[2020-07-28 17:05:02,639 utils.py:21 - info()] --time_mask_num: 2
[2020-07-28 17:05:02,639 utils.py:21 - info()] --freq_mask_num: 2
[2020-07-28 17:05:02,639 utils.py:21 - info()] --architecture: seq2seq
[2020-07-28 17:05:02,639 utils.py:21 - info()] --use_bidirectional: True
[2020-07-28 17:05:02,639 utils.py:21 - info()] --mask_conv: False
[2020-07-28 17:05:02,639 utils.py:21 - info()] --hidden_dim: 512
[2020-07-28 17:05:02,639 utils.py:21 - info()] --dropout: 0.3
[2020-07-28 17:05:02,639 utils.py:21 - info()] --attn_mechanism: multi-head
[2020-07-28 17:05:02,639 utils.py:21 - info()] --num_heads: 4
[2020-07-28 17:05:02,639 utils.py:21 - info()] --label_smoothing: 0.1
[2020-07-28 17:05:02,639 utils.py:21 - info()] --num_encoder_layers: 3
[2020-07-28 17:05:02,639 utils.py:21 - info()] --num_decoder_layers: 2
[2020-07-28 17:05:02,640 utils.py:21 - info()] --extractor: vgg
[2020-07-28 17:05:02,640 utils.py:21 - info()] --activation: hardtanh
[2020-07-28 17:05:02,640 utils.py:21 - info()] --rnn_type: lstm
[2020-07-28 17:05:02,640 utils.py:21 - info()] --teacher_forcing_ratio: 1.0
[2020-07-28 17:05:02,640 utils.py:21 - info()] --dataset_path: /home/mchoe/sr/data/
[2020-07-28 17:05:02,640 utils.py:21 - info()] --data_list_path: /home/mchoe/PycharmProjects/sr/data/data_list/filter_train_list_1000.csv
[2020-07-28 17:05:02,640 utils.py:21 - info()] --label_path: /home/mchoe/PycharmProjects/sr/verf/aihub_label_table.dat
[2020-07-28 17:05:02,640 utils.py:21 - info()] --spec_augment: True
[2020-07-28 17:05:02,640 utils.py:21 - info()] --noise_augment: False
[2020-07-28 17:05:02,640 utils.py:21 - info()] --noiseset_size: 1000
[2020-07-28 17:05:02,640 utils.py:21 - info()] --noise_level: 0.7
[2020-07-28 17:05:02,640 utils.py:21 - info()] --use_cuda: True
[2020-07-28 17:05:02,640 utils.py:21 - info()] --batch_size: 16
[2020-07-28 17:05:02,640 utils.py:21 - info()] --num_workers: 1
[2020-07-28 17:05:02,640 utils.py:21 - info()] --num_epochs: 10
[2020-07-28 17:05:02,640 utils.py:21 - info()] --init_lr: 3e-05
[2020-07-28 17:05:02,640 utils.py:21 - info()] --high_plateau_lr: 0.0003
[2020-07-28 17:05:02,640 utils.py:21 - info()] --low_plateau_lr: 1e-05
[2020-07-28 17:05:02,640 utils.py:21 - info()] --decay_threshold: 0.02
[2020-07-28 17:05:02,640 utils.py:21 - info()] --rampup_period: 400
[2020-07-28 17:05:02,640 utils.py:21 - info()] --exp_decay_period: 120000
[2020-07-28 17:05:02,640 utils.py:21 - info()] --valid_ratio: 0.05
[2020-07-28 17:05:02,640 utils.py:21 - info()] --max_len: 120
[2020-07-28 17:05:02,640 utils.py:21 - info()] --max_grad_norm: 400
[2020-07-28 17:05:02,640 utils.py:21 - info()] --teacher_forcing_step: 0.02
[2020-07-28 17:05:02,640 utils.py:21 - info()] --min_teacher_forcing_ratio: 0.8
[2020-07-28 17:05:02,640 utils.py:21 - info()] --seed: 7
[2020-07-28 17:05:02,640 utils.py:21 - info()] --save_result_every: 1000
[2020-07-28 17:05:02,640 utils.py:21 - info()] --checkpoint_every: 5000
[2020-07-28 17:05:02,640 utils.py:21 - info()] --print_every: 10
[2020-07-28 17:05:02,640 utils.py:21 - info()] --resume: False
[2020-07-28 17:05:02,644 utils.py:21 - info()] Operating System : Linux 5.3.0-61-generic
[2020-07-28 17:05:02,644 utils.py:21 - info()] Processor : x86_64
[2020-07-28 17:05:02,644 utils.py:21 - info()] device : GeForce GTX 1080 Ti
[2020-07-28 17:05:02,644 utils.py:21 - info()] CUDA is available : True
[2020-07-28 17:05:02,644 utils.py:21 - info()] CUDA version : 10.2
[2020-07-28 17:05:02,644 utils.py:21 - info()] PyTorch version : 1.5.1
[2020-07-28 17:05:02,668 utils.py:21 - info()] split dataset start !!
[2020-07-28 17:05:02,669 utils.py:21 - info()] Applying Spec Augmentation...
[2020-07-28 17:05:02,671 utils.py:21 - info()] split dataset complete !!
[2020-07-28 17:05:04,159 utils.py:21 - info()] start
[2020-07-28 17:05:04,160 utils.py:21 - info()] Epoch 0 start
[2020-07-28 17:05:11,096 utils.py:21 - info()] timestep: 10/ 118, loss: 0.6336, cer: 2.93, elapsed: 6.93s 0.12m 0.00h
[2020-07-28 17:05:17,344 utils.py:21 - info()] timestep: 20/ 118, loss: 0.6474, cer: 3.00, elapsed: 6.25s 0.22m 0.00h
[2020-07-28 17:05:24,735 utils.py:21 - info()] timestep: 30/ 118, loss: 0.6424, cer: 2.98, elapsed: 7.39s 0.34m 0.01h
[2020-07-28 17:05:31,703 utils.py:21 - info()] timestep: 40/ 118, loss: 0.6430, cer: 2.96, elapsed: 6.97s 0.46m 0.01h
[2020-07-28 17:05:38,761 utils.py:21 - info()] timestep: 50/ 118, loss: 0.6420, cer: 2.92, elapsed: 7.06s 0.58m 0.01h
[2020-07-28 17:05:45,136 utils.py:21 - info()] timestep: 60/ 118, loss: 0.6403, cer: 2.90, elapsed: 6.37s 0.68m 0.01h
[2020-07-28 17:05:52,177 utils.py:21 - info()] timestep: 70/ 118, loss: 0.6372, cer: 2.89, elapsed: 7.04s 0.80m 0.01h
[2020-07-28 17:05:59,013 utils.py:21 - info()] timestep: 80/ 118, loss: 0.6334, cer: 2.82, elapsed: 6.84s 0.91m 0.02h
[2020-07-28 17:06:06,097 utils.py:21 - info()] timestep: 90/ 118, loss: 0.6271, cer: 2.75, elapsed: 7.08s 1.03m 0.02h
[2020-07-28 17:06:13,275 utils.py:21 - info()] timestep: 100/ 118, loss: 0.6205, cer: 2.58, elapsed: 7.18s 1.15m 0.02h
[2020-07-28 17:06:19,938 utils.py:21 - info()] timestep: 110/ 118, loss: 0.6141, cer: 2.44, elapsed: 6.66s 1.26m 0.02h
batch complete
[2020-07-28 17:06:26,965 utils.py:21 - info()] save checkpoints
/home/mchoe/PycharmProjects/sr4/data/checkpoint/checkpoints/2020_07_28_17_06_26/trainer_states.pt
/home/mchoe/PycharmProjects/sr4/data/checkpoint/checkpoints/2020_07_28_17_06_26/model.pt
[2020-07-28 17:06:26,965 utils.py:21 - info()] train() completed
[2020-07-28 17:06:27,585 utils.py:21 - info()] save checkpoints
/home/mchoe/PycharmProjects/sr4/data/checkpoint/checkpoints/2020_07_28_17_06_26/trainer_states.pt
/home/mchoe/PycharmProjects/sr4/data/checkpoint/checkpoints/2020_07_28_17_06_26/model.pt
[2020-07-28 17:06:27,586 utils.py:21 - info()] Epoch 0 (Training) Loss 0.6065 CER 2.3189
duration per epoch : 83.42599272727966
[2020-07-28 17:06:27,586 utils.py:21 - info()] validate() start
File "./main.py", line 96, in
main()
File "./main.py", line 92, in main
train(opt)
File "./main.py", line 71, in train
num_epochs=opt.num_epochs, teacher_forcing_ratio=opt.teacher_forcing_ratio, resume=opt.resume)
File "../kospeech/trainer/supervised_trainer.py", line 151, in train
valid_cer = self.validate(model, valid_queue)
File "../kospeech/trainer/supervised_trainer.py", line 310, in validate
language_model=None, return_decode_dict=False)
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/mchoe/.conda/envs/sr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'return_decode_dict'

보내주신 pre-trained weight file 문의

안녕하세요!

지난번에 깃허브에 올렸던 index out이 되는게
제가 toy data로 학습했던 weight는 문제가 없더라구요..

혹시 보내주신 weight 파일 돌려보셨을때 에러는 안 나셨었나요??

감사합니다!!

beam search 에러

안녕하세요.
혹시 rnn cell을 LSTM으로 사용할 때 잘 돌아가시나요?

attn_visualize.py module error

attn_visualize.py 에서 모듈에러가 발생했습니다.
9, 10번줄에서 수정된 내용이 적용되지 않은거 같습니다.

수정전
from kospeech.data.preprocess.core import split
from kospeech.model.speller import Speller

수정후
from kospeech.data.preprocess.audio import split
from kospeech.model.decoder import Speller

그리고 최신프로젝트에서 다운받은 model.pt로 테스트를 해보려고 했는데 에러가 발생하는데 변경된 네이밍 등 때문으로 추측 됩니다. 혹시 최신소스에서 만들어진 weight파일을은 따로 없을까요?

언라벨 음성 데이터 예측에 관해 질문이 있습니다.

안녕하세요.

Listener와 Speller를 이어주는 코드에서 Speller forward에서 target을 input으로 받게 되어있는데, 이 경우 언라벨된 음성을 추론하는 것은 불가능할까요?

run_pretrain.py Import error

안녕하세요. run_pretrain.py를 실행시키던 도중 import 에러가 발생하여 문의드립니다.

pre-trained 모델을 다운받은 상태이고, model_path, audio_path 경로지정까지 완료했습니다.
input으로 pcm파일까지 지정해준 상태로 infer-with-pretrain.sh를 실행하니

ImportError: cannot import name 'label_to_string' from 'kospeech.utils' (../kospeech/utils.py)

위와 같은 에러가 발생하더군요. 찾아보니 label_to_string은 kospeech/vocab.py 아래 클래스 내부 함수로 존재하던데 혹시 해결할 수 있는 방법이 있을까요?

train 시 validate index error

안녕하세요. 매번 업데이트 해주시는 코드 열심히 따라가면서 공부하고 있습니다! 감사합니다!
제가 Aihub 데이터셋이 아닌 다른 데이터셋을 사용해서 train, validate를 시도해보고 있는데요.
아래와 같은 에러가 발생하였습니다.

Traceback (most recent call last):
  File "./main.py", line 111, in <module>
    main()
  File "./main.py", line 107, in main
    train(opt)
  File "./main.py", line 86, in train
    num_epochs=opt.num_epochs, teacher_forcing_ratio=opt.teacher_forcing_ratio, resume=opt.resume)
  File "../kospeech/trainer/supervised_trainer.py", line 161, in train
    valid_loss, valid_cer = self.validate(model, valid_queue)
  File "../kospeech/trainer/supervised_trainer.py", line 327, in validate
    loss = self.criterion(logit.contiguous().view(-1, logit.size(-1)), targets.contiguous().view(-1))
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __c
all__
    result = self.forward(*input, **kwargs)
  File "../kospeech/optim/loss.py", line 59, in forward
    label_smoothed.scatter_(1, target.data.unsqueeze(1), self.confidence)
RuntimeError: invalid argument 4: Index tensor must have same size as output tensor apart from the sp
ecified dimension at /pytorch/aten/src/THC/generic/THCTensorScatterGather.cu:328

확인해봤을때 supevised_trainer.py의 validate 함수에서 324번째줄의 logit과 target의 shape이 맞지 않더라구요.

그래서 코드를 보고 있는데 310번쨰 줄에 targets이 들어가있지 않던데 targets을 추가하는게 맞지 않은가해서요.
크게 변경없이 제 나름대로 코드를 다시 짜봤는데 문제가 없는지 확인 부탁드려도 될까요?

    def validate(self, model: nn.Module, queue: queue.Queue) -> float:
        """
        Run training one epoch

        Args:
            model (torch.nn.Module): model to train
            queue (queue.Queue): validation queue, containing input, targets, input_lengths, target_lengths

        Returns: loss, cer
            - **loss** (float): loss of validation
            - **cer** (float): character error rate of validation
        """
        cer = 1.0
        total_loss = 0.
        total_num = 0.

        model.eval()
        logger.info('validate() start')

        with torch.no_grad():
            while True:
                inputs, targets, input_lengths, target_lengths = queue.get()

                if inputs.shape[0] == 0:
                    break
                inputs = inputs.to(self.device)
                #targets = targets[:, 1:].to(self.device)
                targets = targets.to(self.device) # train과 같은 형태로 변경
                model.cuda()

                if self.architecture == 'seq2seq':
                    model.module.flatten_parameters()
                    #output = model(inputs=inputs, input_lengths=input_lengths,
                    #               teacher_forcing_ratio=0.0, return_decode_dict=False)
                    output = model(inputs=inputs, input_lengths=input_lengths, targets=targets, # targets 추가
                                   teacher_forcing_ratio=0.0, return_decode_dict=False) 
                    logit = torch.stack(output, dim=1).to(self.device)
                    targets = targets[:, 1:] # train과 같은 형태로 변경

                elif self.architecture == 'transformer':
                    logit = model(inputs, input_lengths, return_decode_dict=False)

                else:
                    raise ValueError("Unsupported architecture : {0}".format(self.architecture))

                hypothesis = logit.max(-1)[1]
                cer = self.metric(targets, hypothesis)
                logit = logit[:, :targets.size(1), :] # 이 부분은 train에는 없던데 validate만 해당하는 걸까요?
                loss = self.criterion(logit.contiguous().view(-1, logit.size(-1)), targets.contiguous().view(-1))

                total_loss += loss.item()
                total_num += sum(input_lengths)

        logger.info('validate() completed')
        return total_loss / total_num, cer

바쁘실텐데 읽어주셔서 감사합니다!

CPU 기반 학습 가능 여부

안녕하세요,
해당 코드를 학습시에 GPU 이슈가 계속해서 발생하여 CPU 기반으로 코드를 기동하려고 파라미터를 조정하였는데
학습 진행이 안됩니다.
CPU로는 학습이 불가한지요?

설정 내역
utils.py

def check_envirionment(use_cuda):
"""
Check execution envirionment.
OS, Processor, CUDA version, Pytorch version, ... etc.
"""
cuda = use_cuda and torch.cuda.is_available()
#device = torch.device('cuda' if cuda else 'cpu')
device = torch.device('cpu')
print("cuda : ", cuda)
logger.info("Operating System : %s %s" % (platform.system(), platform.release()))
logger.info("Processor : %s" % platform.processor())

GPU 1개일 경우 세팅

안녕하세요!
보내주신 가중치를 가지고 evaluation을 진행하던 중에
아래와 같은 에러가 발생하였습니다.

구글링을 해봤을 때 코드는 gpu의 갯수가 기본으로 2개 이상 세팅되어 있는데
제가 돌려본 colab은 gpu를 1개만 제공하고 있어서 index가 나갔다는 내용을 찾았는데
코드를 어떻게 수정해야할까요?ㅠ

감사합니다!

[2020-06-23 06:30:41,321 utils.py:21 - info()] --mode: eval
[2020-06-23 06:30:41,321 utils.py:21 - info()] --sample_rate: 16000
[2020-06-23 06:30:41,321 utils.py:21 - info()] --window_size: 20
[2020-06-23 06:30:41,321 utils.py:21 - info()] --stride: 10
[2020-06-23 06:30:41,321 utils.py:21 - info()] --n_mels: 80
[2020-06-23 06:30:41,321 utils.py:21 - info()] --normalize: True
[2020-06-23 06:30:41,321 utils.py:21 - info()] --del_silence: True
[2020-06-23 06:30:41,321 utils.py:21 - info()] --input_reverse: True
[2020-06-23 06:30:41,322 utils.py:21 - info()] --feature_extract_by: librosa
[2020-06-23 06:30:41,322 utils.py:21 - info()] --time_mask_para: 50
[2020-06-23 06:30:41,322 utils.py:21 - info()] --freq_mask_para: 12
[2020-06-23 06:30:41,322 utils.py:21 - info()] --time_mask_num: 2
[2020-06-23 06:30:41,322 utils.py:21 - info()] --freq_mask_num: 2
[2020-06-23 06:30:41,322 utils.py:21 - info()] --dataset_path: ../../DATA/KsponSpeech_01/KsponSpeech_0001/
[2020-06-23 06:30:41,322 utils.py:21 - info()] --data_list_path: ../data/data_list/toy_test_list.csv
[2020-06-23 06:30:41,322 utils.py:21 - info()] --label_path: ./data/label/aihub_labels.csv
[2020-06-23 06:30:41,322 utils.py:21 - info()] --num_workers: 4
[2020-06-23 06:30:41,322 utils.py:21 - info()] --use_cuda: True
[2020-06-23 06:30:41,322 utils.py:21 - info()] --model_path: ../data/checkpoints/model.pt
[2020-06-23 06:30:41,322 utils.py:21 - info()] --batch_size: 1
[2020-06-23 06:30:41,322 utils.py:21 - info()] --decode: greedy
[2020-06-23 06:30:41,323 utils.py:21 - info()] --k: 5
[2020-06-23 06:30:41,323 utils.py:21 - info()] --print_every: 10
[2020-06-23 06:30:41,381 utils.py:21 - info()] Operating System : Linux 4.19.104+
[2020-06-23 06:30:41,381 utils.py:21 - info()] Processor : x86_64
[2020-06-23 06:30:41,388 utils.py:21 - info()] device : Tesla K80
[2020-06-23 06:30:41,388 utils.py:21 - info()] CUDA is available : True
[2020-06-23 06:30:41,388 utils.py:21 - info()] CUDA version : 10.1
[2020-06-23 06:30:41,388 utils.py:21 - info()] PyTorch version : 1.5.1+cu101
[2020-06-23 06:31:03,233 utils.py:141 - _init_num_threads()] NumExpr defaulting to 2 threads.
100% 167/167 [04:18<00:00,  1.55s/it]
[2020-06-23 06:35:21,369 utils.py:21 - info()] evaluate() start
cuda
Traceback (most recent call last):
  File "./eval.py", line 66, in <module>
    main()
  File "./eval.py", line 62, in main
    inference(opt)
  File "./eval.py", line 42, in inference
    evaluator.evaluate(model)
  File "../kospeech/evaluator/evaluator.py", line 43, in evaluate
    cer = self.decoder.search(model, eval_queue, self.device, self.print_every)
  File "../kospeech/decode/search.py", line 42, in search
    output, _ = model(inputs, input_lengths, teacher_forcing_ratio=0.0, language_model=self.language_model)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 151, in forward
    inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 162, in scatter
    return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/scatter_gather.py", line 36, in scatter_kwargs
    inputs = scatter(inputs, target_gpus, dim) if inputs else []
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
    res = scatter_map(inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
    return list(zip(*map(scatter_map, obj)))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
    return Scatter.apply(target_gpus, None, dim, obj)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py", line 88, in forward
    streams = [_get_stream(device) for device in target_gpus]
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py", line 88, in <listcomp>
    streams = [_get_stream(device) for device in target_gpus]
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py", line 115, in _get_stream
    if _streams[device] is None:
IndexError: list index out of range

seq2seq import 문제

처음 pretrain 을 해볼려고
model.pt 를 다운받고
bash run_pretrain.sh
실행했는데

Traceback (most recent call last):
File "run_pretrain.py", line 33, in
model = load_test_model(opt, opt.device)
File "../kospeech/model_builder.py", line 136, in load_test_model
model = torch.load(opt.model_path, map_location=lambda storage, loc: storage).to(device)
File "/root/anaconda3/envs/lee_stt/lib/python3.7/site-packages/torch/serialization.py", line 585, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/root/anaconda3/envs/lee_stt/lib/python3.7/site-packages/torch/serialization.py", line 765, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named 'kospeech.models.seq2seq'

이런에러가뜹니다
혹시 위치를 못찾아서 그런가해서
kospeech/models 에 seq2seq 를 복사해줬는데

계속 이런식으로 에러가뜹니다..
무엇이 문제인걸까요..

TypeError: forward() missing 2 required positional arguments: 'inputs' and 'targets'

안녕하세요. ^^
아침에 새로 clone 해서 돌렸는데 아래 에러가 발생했어요.
해결해 보려고 오전에 봤는데 도저히 안되서 이슈등록합니다.
확인 한번 해 주세요~

[2020-03-16 12:07:17,562 dataset.py:158 - split_dataset()] split dataset start !!
[2020-03-16 12:07:17,562 dataset.py:67 - augmentation()] Applying Augmentation...
[2020-03-16 12:07:17,563 dataset.py:202 - split_dataset()] split dataset complete !!
[2020-03-16 12:07:17,563 train.py:126 - ()] start
[2020-03-16 12:07:23,428 trainer.py:81 - supervised_train()] timestep: 0/ 24, loss: 0.4985, cer: 1.84, elapsed: 5.86s 0.10m 0.00h
[2020-03-16 12:07:31,911 trainer.py:81 - supervised_train()] timestep: 10/ 24, loss: 0.4857, cer: 1.28, elapsed: 8.48s 0.24m 0.00h
[2020-03-16 12:07:38,277 trainer.py:81 - supervised_train()] timestep: 20/ 24, loss: 0.4668, cer: 1.18, elapsed: 6.37s 0.35m 0.01h
[2020-03-16 12:07:38,951 trainer.py:96 - supervised_train()] train() completed
[2020-03-16 12:07:39,593 train.py:151 - ()] Epoch 0 (Training) Loss 0.4552 CER 1.1624
<queue.Queue object at 0x7fa4f361f350>
[2020-03-16 12:07:39,594 evaluator.py:18 - evaluate()] evaluate() start
Traceback (most recent call last):
File "train.py", line 159, in
valid_loss, valid_cer = evaluate(model, valid_queue, criterion, device)
File "/schwang/stt_kai/Korean-Speech-Recognition_20200316/package/evaluator.py", line 38, in evaluate
y_hat, logit = model(feats, scripts, teacher_forcing_ratio=0.0, use_beam_search = False)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/root/anaconda3/envs/pytorch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
TypeError: forward() missing 2 required positional arguments: 'inputs' and 'targets'

Unexpected broadcasting?

Hi and thanks for making a great project!👍

When I tried seq2seq decoder with LocationAwareAttention, I found possibly unexpected broacasting caused by AddNorm.
https://github.com/sooftware/KoSpeech/blob/f90354b565a43217cce580fbbe20629e3d41a174/kospeech/models/acoustic/seq2seq/decoder.py#L124
Expected context.size() is (batch, hidden_dim) but I received (batch, batch, hidden_dim).

I think this probrem is caused by AddNorm.
https://github.com/sooftware/KoSpeech/blob/f90354b565a43217cce580fbbe20629e3d41a174/kospeech/models/acoustic/transformer/sublayers.py#L24-L25
output[0] + residual represents (batch_size, hidden_dim) + (batch_size, 1, hidden_dim) and this is broadcasted to (batch_size, batch_size, hidden_dim).

Reproduction code

import torch

from kospeech.models.attention import LocationAwareAttention
from kospeech.models.acoustic.transformer.sublayers import AddNorm

hidden_dim = 512
attention = LocationAwareAttention(d_model=hidden_dim)
attention_norm = AddNorm(LocationAwareAttention(d_model=hidden_dim), d_model=hidden_dim)

batch_size = 2
seq_length = 128

output = torch.rand(batch_size, 1, hidden_dim, dtype=torch.float32)
encoder_outputs = torch.rand(batch_size, seq_length, hidden_dim, dtype=torch.float32)
attn = None

context, _ = attention(output, encoder_outputs, attn)
context_norm, _ = attention_norm(output, encoder_outputs, attn)

print(context.size()) #torch.Size([2, 512]) i.e. (batch_size, hidden_dim)
print(context_norm.size()) #torch.Size([2, 2, 512]) i.e. (batch_size, batch_size, hidden_dim)

Environment

PyTorch 1.7.0a0+018b4d7
Python 3.7.7

Bug in Beam Search

There was a bug on the beam search logic.
Now, we've solved the bug, those of you who have previously downloaded it should download it again.

학습결과 성능이 예상보다 잘 안나옵니다

설정 옵션은 batch_size, num_epochs 이 두 옵션을 제외하고 세팅 되어있는 값 그대로 사용하였으며
학습한 데이터셋은 aihub에서 제공하는 KsponSpeech 데이터셋을 모두 사용하였습니다.
데이터셋은 link에서의 가이드 대로 전처리 후 학습하였습니다.

여기서 성능을 더 내기위해 제가 할 수있는게 있을까요?

학습결과
loss : 0.14854020402704088
cer : 0.23955241871794233

Failed to create virtual environment.

Running on Mac OSX Cattalina 10.15.3
Has Python 3.8 on local
Received Error from commencing following commands

pipenv shell

Using /Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8 (3.8.1) to create virtualenv…
⠙ Creating virtual environment...Already using interpreter /Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8
Using base prefix '/Library/Frameworks/Python.framework/Versions/3.8'
New python executable in /Users/noopy/.local/share/virtualenvs/End-to-end-Speech-Recognition-XROjrEBt/bin/python3.8
Also creating executable in /Users/noopy/.local/share/virtualenvs/End-to-end-Speech-Recognition-XROjrEBt/bin/python
Installing setuptools, pip, wheel...

  Complete output from command /Users/noopy/.local/...OjrEBt/bin/python3.8 - setuptools pip wheel:
  Traceback (most recent call last):
  File "<stdin>", line 33, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv_support/pip-19.3.1-py2.py3-none-any.whl/pip/_internal/main.py", line 45, in main
    parser.add_argument('--attn_dim', type=int, default=128, help='dimention of attention (default: 128)')
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv_support/pip-19.3.1-py2.py3-none-any.whl/pip/_internal/commands/__init__.py", line 96, in create_command
  File "/Users/noopy/.local/share/virtualenvs/End-to-end-Speech-Recognition-XROjrEBt/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "<frozen zipimport>", line 259, in load_module
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv_support/pip-19.3.1-py2.py3-none-any.whl/pip/_internal/commands/install.py", line 23, in <module>
  File "<frozen zipimport>", line 259, in load_module
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv_support/pip-19.3.1-py2.py3-none-any.whl/pip/_internal/cli/req_command.py", line 20, in <module>
  File "<frozen zipimport>", line 259, in load_module
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv_support/pip-19.3.1-py2.py3-none-any.whl/pip/_internal/network/session.py", line 17, in <module>
  File "<frozen zipimport>", line 259, in load_module
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv_support/pip-19.3.1-py2.py3-none-any.whl/pip/_vendor/requests/__init__.py", line 115, in <module>
  File "<frozen zipimport>", line 259, in load_module
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv_support/pip-19.3.1-py2.py3-none-any.whl/pip/_vendor/requests/packages.py", line 8, in <module>
  File "<frozen zipimport>", line 259, in load_module
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv_support/pip-19.3.1-py2.py3-none-any.whl/pip/_vendor/idna/__init__.py", line 2, in <module>
  File "<frozen zipimport>", line 259, in load_module
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv_support/pip-19.3.1-py2.py3-none-any.whl/pip/_vendor/idna/core.py", line 3, in <module>
ImportError: dlopen(/Users/noopy/.local/share/virtualenvs/End-to-end-Speech-Recognition-XROjrEBt/lib/python3.8/lib-dynload/unicodedata.cpython-38-darwin.so, 2): no suitable image found.  Did find:
        /Users/noopy/.local/share/virtualenvs/End-to-end-Speech-Recognition-XROjrEBt/lib/python3.8/lib-dynload/unicodedata.cpython-38-darwin.so: code signature in (/Users/noopy/.local/share/virtualenvs/End-to-end-Speech-Recognition-XROjrEBt/lib/python3.8/lib-dynload/unicodedata.cpython-38-darwin.so) not valid for use in process using Library Validation: Library load disallowed by System Policy
----------------------------------------
...Installing setuptools, pip, wheel...done.

✘ Failed creating virtual environment 
[pipenv.exceptions.VirtualenvCreationException]:   File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pipenv/cli/command.py", line 385, in shell
[pipenv.exceptions.VirtualenvCreationException]:       do_shell(
[pipenv.exceptions.VirtualenvCreationException]:   File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pipenv/core.py", line 2155, in do_shell
[pipenv.exceptions.VirtualenvCreationException]:       ensure_project(
[pipenv.exceptions.VirtualenvCreationException]:   File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pipenv/core.py", line 570, in ensure_project
[pipenv.exceptions.VirtualenvCreationException]:       ensure_virtualenv(
[pipenv.exceptions.VirtualenvCreationException]:   File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pipenv/core.py", line 505, in ensure_virtualenv
[pipenv.exceptions.VirtualenvCreationException]:       do_create_virtualenv(
[pipenv.exceptions.VirtualenvCreationException]:   File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pipenv/core.py", line 934, in do_create_virtualenv
[pipenv.exceptions.VirtualenvCreationException]:       raise exceptions.VirtualenvCreationException(
[pipenv.exceptions.VirtualenvCreationException]: Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv.py", line 2634, in <module>
    main()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv.py", line 860, in main
    create_environment(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv.py", line 1179, in create_environment
    install_wheel(to_install, py_executable, search_dirs, download=download)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv.py", line 1023, in install_wheel
    _install_wheel_with_search_dir(download, project_names, py_executable, search_dirs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv.py", line 1116, in _install_wheel_with_search_dir
    call_subprocess(cmd, show_stdout=False, extra_env=env, stdin=script)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/virtualenv.py", line 963, in call_subprocess
    raise OSError("Command {} failed with error code {}".format(cmd_desc, proc.returncode))
OSError: Command /Users/noopy/.local/...OjrEBt/bin/python3.8 - setuptools pip wheel failed with error code 1

Failed to create virtual environment.

Even if I try to download directly to the local (without virtual environment configuration),

pip install -r requirements.txt

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/bin/pip", line 8, in <module>
    sys.exit(main())
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_internal/cli/main.py", line 73, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_internal/commands/__init__.py", line 96, in create_command
    module = importlib.import_module(module_path)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 24, in <module>
    from pip._internal.cli.req_command import RequirementCommand
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 15, in <module>
    from pip._internal.index.package_finder import PackageFinder
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_internal/index/package_finder.py", line 21, in <module>
    from pip._internal.index.collector import parse_links
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_internal/index/collector.py", line 12, in <module>
    from pip._vendor import html5lib, requests
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_vendor/requests/__init__.py", line 115, in <module>
    from . import packages
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_vendor/requests/packages.py", line 8, in <module>
    locals()[package] = __import__(vendored_package)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_vendor/idna/__init__.py", line 2, in <module>
    from .core import *
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pip/_vendor/idna/core.py", line 3, in <module>
    import unicodedata
ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/lib-dynload/unicodedata.cpython-38-darwin.so, 2): no suitable image found.  Did find:
        /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/lib-dynload/unicodedata.cpython-38-darwin.so: code signature in (/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/lib-dynload/unicodedata.cpython-38-darwin.so) not valid for use in process using Library Validation: Library load disallowed by System Policy

How we can train languages other than Korean?

run_pretrain.py error

mfcc 로 학습된 pretrained 모델로 inference 를 하는 과정에서 지속적으로 dimension 에러가 발생합니다.

run_pretrain.py 의
model = load_test_model(opt, opt.device)
한 후 output = model( ... ) 에서,

구체적으로 models/las/encoder.py 의 97번째 줄에서 (output, hidden = self.rnn(conv_feat)) 해당 에러가 발생합니다.

무엇이 문제인지 모르겠네요..

[Problem Fix] train CER과 test CER 차이

Bug on models.speller.forward()

KAI.Lib 팀의 원철황입니다.

티쳐포싱 미사용시, forward_step() 함수에서 이전 타입스텝의 히든스테이트를 업데이트하고, 다음 셀로 전달해야하는데, 전달되지 않는 문제가 있었습니다.

현재는 문제를 해결했으니 이전에 코드를 다운받아 학습 진행 중이신 분들은 코드를 다시 다운받아 사용해주시기 바랍니다.

감사합니다.

def _forward_step(self, input, hidden, listener_outputs=None, function=F.log_softmax):
        """ forward one time step """
        batch_size = input.size(0)
        output_size = input.size(1)

        embedded = self.embedding(input).to(self.device)
        embedded = self.input_dropout(embedded)

        if self.training:
            self.rnn.flatten_parameters()

        output, hidden = self.rnn(embedded, hidden)

        if self.use_attention:
            output = self.attention(output, listener_outputs)
        else:
            output = output

        output = self.w(output.contiguous().view(-1, self.hidden_size))
        predicted_softmax = function(output, dim=1).view(batch_size, output_size, -1)

        return predicted_softmax, hidden

학습진행이 안되는거 같아요.

[2020-03-05 14:23:06,776 main.py:122 - ()] start
여기까진 진행이 되었는데 그 이후 아무반응이 없습니다. 혹시 짐작가시는게 있으시면 도움을 주세요.
혹시 멀티 GPU가 원인일수도 있나요?
2080 Ti 두개가 연결된 우분투환경에서 진행중입니다.

Fix Bug on LR-Sch

There was a bug on LR-Schedule logic.
Now, we've solved the bug, those of you who have previously downloaded it should download it again. ( only if you want to use multi-step lr )

run_pretrain 에러

안녕하세요, 업로드 해주신 run_pretrain를 기동하려고 하니 아래와 같은 에러가 발생하였습니다.

금일 코드를 새로 받아 기동한 결과 seq2seq 경로에 modules.py가 없다는 에러가 발생하 해당 파일은 옯기고 실행하였습니다.
Traceback (most recent call last):
File "run_pretrain.py", line 34, in
model = load_test_model(opt, opt.device)
File "../kospeech/model_builder.py", line 124, in load_test_model
model = torch.load(opt.model_path, map_location=lambda storage, loc: storage).to(device)
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/serialization.py", line 773, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named 'kospeech.models.seq2seq.modules'

실행결과 index가 맞지 않다고 하는데 추가로 설정해주어야 하는 것이 있을까요?
Traceback (most recent call last):
File "run_pretrain.py", line 38, in
output = model(feature_vector.unsqueeze(0), input_length, teacher_forcing_ratio=0.0, return_attns=False)
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 151, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 36, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
res = scatter_map(inputs)
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
return Scatter.apply(target_gpus, None, dim, obj)
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 88, in forward
streams = [_get_stream(device) for device in target_gpus]
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 88, in
streams = [_get_stream(device) for device in target_gpus]
File "/home/ai/anaconda3/envs/sr_test/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 115, in _get_stream
if _streams[device] is None:
IndexError: list index out of range

Pre-train Model

현재 Pre-train 모델을 학습하고 있습니다.
빠른 시일내로 모델 파일을 업데이트 하겠습니다.

IndexError: list index out of range가 납니다ㅠ

안녕하세요.
가장 최근에 보내주신 pre-trained weight와 코드를 이용하여
eval.sh를 돌려보았으나 여전히 indexerror가 나서요ㅠ
돌려보셨을때 이상 없으셨나요?? model과 코드 싱크가 맞지 않는건가 고민이 되어서요!

추가로 보내주신 model에는

MaskConv가 아닌 MaskCNN으로 되어있어 model 폴더의 convoultional.py에서 MaskCNN으로 수정하였고
data폴더의 preprocess 폴더에서 parser.py에 librosa가 import 되어있지 않아 define이 안되어서 추가로 import하였습니다!

[2020-07-03 11:36:12,863 utils.py:21 - info()] --mode: eval
[2020-07-03 11:36:12,863 utils.py:21 - info()] --feature: mel
[2020-07-03 11:36:12,863 utils.py:21 - info()] --sample_rate: 16000
[2020-07-03 11:36:12,863 utils.py:21 - info()] --window_size: 20
[2020-07-03 11:36:12,863 utils.py:21 - info()] --stride: 10
[2020-07-03 11:36:12,863 utils.py:21 - info()] --n_mels: 80
[2020-07-03 11:36:12,863 utils.py:21 - info()] --normalize: True
[2020-07-03 11:36:12,863 utils.py:21 - info()] --del_silence: True
[2020-07-03 11:36:12,863 utils.py:21 - info()] --input_reverse: True
[2020-07-03 11:36:12,863 utils.py:21 - info()] --feature_extract_by: librosa
[2020-07-03 11:36:12,863 utils.py:21 - info()] --time_mask_para: 50
[2020-07-03 11:36:12,863 utils.py:21 - info()] --freq_mask_para: 12
[2020-07-03 11:36:12,863 utils.py:21 - info()] --time_mask_num: 2
[2020-07-03 11:36:12,863 utils.py:21 - info()] --freq_mask_num: 2
[2020-07-03 11:36:12,863 utils.py:21 - info()] --dataset_path: ../../DATA/KsponSpeech_01/KsponSpeech$
0001/
[2020-07-03 11:36:12,863 utils.py:21 - info()] --data_list_path: ../data/data_list/toy_test_list.csv
[2020-07-03 11:36:12,863 utils.py:21 - info()] --label_path: ./data/label/aihub_labels.csv
[2020-07-03 11:36:12,863 utils.py:21 - info()] --num_workers: 4
[2020-07-03 11:36:12,863 utils.py:21 - info()] --use_cuda: True
[2020-07-03 11:36:12,864 utils.py:21 - info()] --model_path: ../data/checkpoints/KsponSpeech_87.44%.$
t
[2020-07-03 11:36:12,864 utils.py:21 - info()] --batch_size: 1
[2020-07-03 11:36:12,864 utils.py:21 - info()] --decode: greedy
[2020-07-03 11:36:12,864 utils.py:21 - info()] --k: 5
[2020-07-03 11:36:12,864 utils.py:21 - info()] --print_every: 10
[2020-07-03 11:36:12,906 utils.py:21 - info()] Operating System : Linux 4.18.0-193.6.3.el8_2.x86_64
[2020-07-03 11:36:12,906 utils.py:21 - info()] Processor : x86_64
[2020-07-03 11:36:12,909 utils.py:21 - info()] device : GeForce RTX 2080 Ti
[2020-07-03 11:36:12,909 utils.py:21 - info()] device : GeForce RTX 2080 Ti
[2020-07-03 11:36:12,909 utils.py:21 - info()] CUDA is available : True
[2020-07-03 11:36:12,909 utils.py:21 - info()] CUDA version : 10.2
[2020-07-03 11:36:12,909 utils.py:21 - info()] PyTorch version : 1.5.1
100%|███████████████████████████████████████████████████████████| 167/167 [00:00<00:00, 39278.24it/s$
[2020-07-03 11:36:15,344 utils.py:21 - info()] evaluate() start
Traceback (most recent call last):
  File "./eval.py", line 66, in <module>
    main()
  File "./eval.py", line 62, in main
    inference(opt)
  File "./eval.py", line 42, in inference
    evaluator.evaluate(model)
  File "../kospeech/evaluator/evaluator.py", line 43, in evaluate
    cer = self.decoder.search(model, eval_queue, self.device, self.print_every)
  File "../kospeech/decode/search.py", line 41, in search
    teacher_forcing_ratio=0.0, language_model=self.language_model)
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __c
all__
    result = self.forward(*input, **kwargs)
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 151
, in forward
    inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162
, in scatter
    return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 37
, in scatter_kwargs
    kwargs = scatter(kwargs, target_gpus, dim) if kwargs else []
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 28
, in scatter
    res = scatter_map(inputs)
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 19
, in scatter_map
    return list(map(type(obj), zip(*map(scatter_map, obj.items()))))
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 15
, in scatter_map
    return list(zip(*map(scatter_map, obj)))
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 13
, in scatter_map
    return Scatter.apply(target_gpus, None, dim, obj)
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 88, in
 forward
    streams = [_get_stream(device) for device in target_gpus]
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 88, in
 <listcomp>
    streams = [_get_stream(device) for device in target_gpus]
  File "/home/stt_py/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 115, i
n _get_stream
    if _streams[device] is None:
IndexError: list index out of range

Two issues about executing run_pretrain.sh & prepare_ksponspeech.sh

안녕하세요, 이렇게 좋은 자료를 Open으로 해주셔서 감사합니다. 많이 배우고 있어요.

코드를 자세히 살펴보기 시작한지는 얼마 되지 않았는데요, 초기 단계에서 에러가 발생하여 이슈 올려려요.

run_pretrain.sh의 코드를 진행할 때 new() 관련 Type 에러가 발생하구요.

prepare_ksponspeech.sh의 코드를 진행할 때에는 Value 에러가 발생합니다.

두 가지 뭔가 간단한 것을 놓치고 있는 듯 한데, 제가 어떤 점을 간과하고 있는지 알 수 있을지요.

감사합니다!

AttributeError: While running eval.py

`(base) C:\Users\Admin\ELYOR\kospeech\KoSpeech-master>python eval.py
[2020-07-19 17:58:02,235 utils.py:21 - info()] --mode: eval
[2020-07-19 17:58:02,236 utils.py:21 - info()] --transform_method: mel
[2020-07-19 17:58:02,236 utils.py:21 - info()] --sample_rate: 16000
[2020-07-19 17:58:02,236 utils.py:21 - info()] --window_size: 20
[2020-07-19 17:58:02,237 utils.py:21 - info()] --stride: 10
[2020-07-19 17:58:02,237 utils.py:21 - info()] --n_mels: 80
[2020-07-19 17:58:02,238 utils.py:21 - info()] --normalize: False
[2020-07-19 17:58:02,238 utils.py:21 - info()] --del_silence: False
[2020-07-19 17:58:02,238 utils.py:21 - info()] --input_reverse: False
[2020-07-19 17:58:02,238 utils.py:21 - info()] --feature_extract_by: librosa
[2020-07-19 17:58:02,238 utils.py:21 - info()] --time_mask_para: 50
[2020-07-19 17:58:02,239 utils.py:21 - info()] --freq_mask_para: 12
[2020-07-19 17:58:02,239 utils.py:21 - info()] --time_mask_num: 2
[2020-07-19 17:58:02,239 utils.py:21 - info()] --freq_mask_num: 2
[2020-07-19 17:58:02,239 utils.py:21 - info()] --dataset_path: /data1/
[2020-07-19 17:58:02,239 utils.py:21 - info()] --data_list_path: ./data/data_list/filter_train_list.csv
[2020-07-19 17:58:02,239 utils.py:21 - info()] --label_path: ./data/label/aihub_labels.csv
[2020-07-19 17:58:02,239 utils.py:21 - info()] --num_workers: 4
[2020-07-19 17:58:02,239 utils.py:21 - info()] --use_cuda: False
[2020-07-19 17:58:02,240 utils.py:21 - info()] --model_path: None
[2020-07-19 17:58:02,240 utils.py:21 - info()] --batch_size: 1
[2020-07-19 17:58:02,240 utils.py:21 - info()] --decode: greedy
[2020-07-19 17:58:02,240 utils.py:21 - info()] --k: 5
[2020-07-19 17:58:02,240 utils.py:21 - info()] --print_every: 10
[2020-07-19 17:58:02,240 utils.py:21 - info()] Operating System : Windows 10
[2020-07-19 17:58:02,240 utils.py:21 - info()] Processor : Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
[2020-07-19 17:58:02,269 utils.py:21 - info()] CUDA is available : True
[2020-07-19 17:58:02,270 utils.py:21 - info()] PyTorch version : 1.5.1
Traceback (most recent call last):
File "C:\Users\Admin\anaconda3\lib\site-packages\torch\serialization.py", line 311, in _check_seekable
f.seek(f.tell())
AttributeError: 'NoneType' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "eval.py", line 58, in
main()
File "eval.py", line 54, in main
inference(opt)
File "eval.py", line 25, in inference
model = load_test_model(opt, device)
File "C:\Users\Admin\ELYOR\kospeech\KoSpeech-master\kospeech\model_builder.py", line 118, in load_test_model
model = torch.load(opt.model_path, map_location=lambda storage, loc: storage).to(device)
File "C:\Users\Admin\anaconda3\lib\site-packages\torch\serialization.py", line 584, in load
with _open_file_like(f, 'rb') as opened_file:
File "C:\Users\Admin\anaconda3\lib\site-packages\torch\serialization.py", line 239, in _open_file_like
return _open_buffer_reader(name_or_buffer)
File "C:\Users\Admin\anaconda3\lib\site-packages\torch\serialization.py", line 224, in init
_check_seekable(buffer)
File "C:\Users\Admin\anaconda3\lib\site-packages\torch\serialization.py", line 314, in _check_seekable
raise_err_msg(["seek", "tell"], e)
File "C:\Users\Admin\anaconda3\lib\site-packages\torch\serialization.py", line 307, in raise_err_msg
raise type(e)(msg)
AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.`

Error Message during training

[2020-08-27 20:20:14,206 utils.py:21 - info()] timestep: 10/70530, loss: nan, cer: 3.32, elapsed: 41.31s 0.69m 0.01h, lr: 0.00030
[2020-08-27 20:20:49,702 utils.py:21 - info()] timestep: 20/70530, loss: nan, cer: 2.88, elapsed: 35.50s 1.28m 0.02h, lr: 0.00030
[2020-08-27 20:21:18,650 utils.py:21 - info()] timestep: 30/70530, loss: nan, cer: 2.81, elapsed: 28.95s 1.76m 0.03h, lr: 0.00030
[2020-08-27 20:22:01,191 utils.py:21 - info()] timestep: 40/70530, loss: nan, cer: 2.96, elapsed: 42.54s 2.47m 0.04h, lr: 0.00030
[2020-08-27 20:22:39,461 utils.py:21 - info()] timestep: 50/70530, loss: nan, cer: 2.98, elapsed: 38.27s 3.11m 0.05h, lr: 0.00030
[2020-08-27 20:23:21,102 utils.py:21 - info()] timestep: 60/70530, loss: nan, cer: 3.09, elapsed: 41.64s 3.80m 0.06h, lr: 0.00030
[2020-08-27 20:23:53,312 utils.py:21 - info()] timestep: 70/70530, loss: nan, cer: 3.10, elapsed: 32.21s 4.34m 0.07h, lr: 0.00030
[2020-08-27 20:24:25,110 utils.py:21 - info()] timestep: 80/70530, loss: nan, cer: 3.08, elapsed: 31.80s 4.87m 0.08h, lr: 0.00030
[2020-08-27 20:25:10,588 utils.py:21 - info()] timestep: 90/70530, loss: nan, cer: 3.16, elapsed: 45.48s 5.63m 0.09h, lr: 0.00030
[2020-08-27 20:25:44,441 utils.py:21 - info()] timestep: 100/70530, loss: nan, cer: 3.13, elapsed: 33.85s 6.19m 0.10h, lr: 0.00030
Traceback (most recent call last):
File "./main.py", line 111, in
main()
File "./main.py", line 107, in main
train(opt)
File "./main.py", line 86, in train
num_epochs=opt.num_epochs, teacher_forcing_ratio=opt.teacher_forcing_ratio, resume=opt.resume)
File "../kospeech/trainer/supervised_trainer.py", line 146, in train
train_queue, teacher_forcing_ratio)
File "../kospeech/trainer/supervised_trainer.py", line 231, in __train_epoches
logit = model(inputs, input_lengths, targets, return_attns=False)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 143, in forward
return self.module(*inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "../kospeech/models/acoustic/transformer/transformer.py", line 160, in forward
output, decoder_self_attns, memory_attns = self.decoder(targets, input_lengths, memory)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "../kospeech/models/acoustic/transformer/transformer.py", line 283, in forward
output = self.input_dropout(self.embedding(inputs) + self.positional_encoding(inputs.size(1)))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "../kospeech/models/acoustic/transformer/embeddings.py", line 43, in forward
return self.embedding(inputs) * self.sqrt_dim
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1724, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
train_queue, teacher_forcing_ratio)
File "../kospeech/trainer/supervised_trainer.py", line 231, in __train_epoches
logit = model(inputs, input_lengths, targets, return_attns=False)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 143, in forward
return self.module(*inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "../kospeech/models/acoustic/transformer/transformer.py", line 160, in forward
output, decoder_self_attns, memory_attns = self.decoder(targets, input_lengths, memory)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "../kospeech/models/acoustic/transformer/transformer.py", line 283, in forward
output = self.input_dropout(self.embedding(inputs) + self.positional_encoding(inputs.size(1)))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "../kospeech/models/acoustic/transformer/embeddings.py", line 43, in forward
return self.embedding(inputs) * self.sqrt_dim
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwa
aihub_labels.zip

rgs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1724, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

프로젝트 관련해 몇 가지 질문드립니다!

안녕하세요. 저는 음성인식에 대해 공부하고 있는 학생입니다.
프로젝트 관련해 몇 가지 질문 피드백 해주시면 감사하겠습니다!

모델의 구조를 보면 input으로 target이 들어가게 되던데 혹시 labeling 되어있지 않은 데이터의 예측도 가능한가요?
혹시 kaldi의 zeroth data로 돌려보신 경험이 있는지 궁금하며, 있으시다면 성능이 궁금합니다. 데이터를 바꾸어 돌려보려고 하는데, train 중 cer과 loss는 감소되는데 validation에서 loss와 cer은 거의 변하지않고 고정적이네요. 실제로 train set과 test set을 valid set으로 사용하였을 때도 아래와 같이 결과가 거의 동일하며, 사진은 혹시 valid set 자체의 성능이 잘 나오지않는가 싶어 valid set을 train set으로 몇 가지 샘플만 학습과 평가에 사용해보았습니다. 그러나 결과는 동일하였습니다. 라벨링은 본 프로젝트에서 선정한 라벨셋과 kaldi에서 출현하는 캐릭터만 따로 수집하여 사용해보았을 때 동일하였습니다. 혹시 이러한 경험이 있으시다면 어떻게 해결하셨는지도 궁금합니다!

외부 언어모델 통합관련

안녕하세요
좋은 소스 감사합니다
Readme에서 언급하신 외부 언어모델이 어떻게 활용되고 있는지 궁금하여 글을 남깁니다. 설명 부탁드립니다.

multi gpu 학습 후 single gpu 또는 cpu inference 시에 gpu index error가 납니다

안녕하세요.
29번 이슈를 올렸었는데 제가 답이 느려서인지 close됐더라구요ㅠㅠ
그뒤로도 글을 올렸으나 혹시 알림이 가지 않을까하여 다시 올립니다.

말씀해주신대로 spect로 넘겨서도 진행해봤고
gpu 2개인 서버로 학습 후 colab gpu 1개로 inference해봤는데 같은 에러인 index out이 뜨고
colab에서 학습한 모델은 서버에서 잘 돌아가는 것을 확인하였습니다.

추가로 주신 pre-trained model을 가지고 cpu로 해봤는데 아래와 같은 에러가 뜹니다.

[2020-07-06 08:02:07,754 utils.py:21 - info()] --mode: eval
[2020-07-06 08:02:07,754 utils.py:21 - info()] --sample_rate: 16000
[2020-07-06 08:02:07,754 utils.py:21 - info()] --window_size: 20
[2020-07-06 08:02:07,754 utils.py:21 - info()] --stride: 10
[2020-07-06 08:02:07,754 utils.py:21 - info()] --n_mels: 80
[2020-07-06 08:02:07,754 utils.py:21 - info()] --normalize: True
[2020-07-06 08:02:07,754 utils.py:21 - info()] --del_silence: True
[2020-07-06 08:02:07,754 utils.py:21 - info()] --input_reverse: True
[2020-07-06 08:02:07,755 utils.py:21 - info()] --feature_extract_by: auidotorch
[2020-07-06 08:02:07,755 utils.py:21 - info()] --time_mask_para: 50
[2020-07-06 08:02:07,755 utils.py:21 - info()] --freq_mask_para: 12
[2020-07-06 08:02:07,755 utils.py:21 - info()] --time_mask_num: 2
[2020-07-06 08:02:07,755 utils.py:21 - info()] --freq_mask_num: 2
[2020-07-06 08:02:07,755 utils.py:21 - info()] --dataset_path: ../../DATA/KsponSpeech_01/KsponSpeech_0001/
[2020-07-06 08:02:07,755 utils.py:21 - info()] --data_list_path: ../data/data_list/toy_test_list.csv
[2020-07-06 08:02:07,755 utils.py:21 - info()] --label_path: ./data/label/aihub_labels.csv
[2020-07-06 08:02:07,755 utils.py:21 - info()] --num_workers: 4
[2020-07-06 08:02:07,755 utils.py:21 - info()] --use_cuda: True
[2020-07-06 08:02:07,755 utils.py:21 - info()] --model_path: ../data/checkpoints/KsponSpeech_87.44%.pt
[2020-07-06 08:02:07,755 utils.py:21 - info()] --batch_size: 1
[2020-07-06 08:02:07,755 utils.py:21 - info()] --decode: beam
[2020-07-06 08:02:07,755 utils.py:21 - info()] --k: 5
[2020-07-06 08:02:07,755 utils.py:21 - info()] --print_every: 10
[2020-07-06 08:02:07,862 utils.py:21 - info()] Operating System : Linux 4.19.104+
[2020-07-06 08:02:07,862 utils.py:21 - info()] Processor : x86_64
[2020-07-06 08:02:07,862 utils.py:21 - info()] CUDA is available : False
[2020-07-06 08:02:07,862 utils.py:21 - info()] PyTorch version : 1.5.1+cu101
[2020-07-06 08:02:18,641 utils.py:141 - _init_num_threads()] NumExpr defaulting to 2 threads.
100% 167/167 [01:49<00:00,  1.53it/s]
[2020-07-06 08:04:07,726 utils.py:21 - info()] evaluate() start
Traceback (most recent call last):
  File "./eval.py", line 66, in <module>
    main()
  File "./eval.py", line 62, in main
    inference(opt)
  File "./eval.py", line 42, in inference
    evaluator.evaluate(model)
  File "../kospeech/evaluator/evaluator.py", line 43, in evaluate
    cer = self.decoder.search(model, eval_queue, self.device, self.print_every)
  File "../kospeech/decode/search.py", line 81, in search
    return super(BeamSearch, self).search(model, queue, device, print_every)
  File "../kospeech/decode/search.py", line 41, in search
    teacher_forcing_ratio=0.0, language_model=self.language_model)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 149, in forward
    "them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

구글링해봤을때 model을 단순히 torch.save하는 경우에
사용된 특정 class 및 directory에 바인딩되는 문제가 있다고 합니다.
https://discuss.pytorch.org/t/how-could-i-train-on-multi-gpu-and-infer-with-single-gpu/22838/5

그래서 model을 저장할땐 torch.save(model.state_dict())로 저장한 후
로드할 땐 torch.load_state_dict(torch.load(path))로 pytorch에서 권장하고 있는데
https://pytorch.org/docs/master/notes/serialization.html

이 부분을 진행해보려고 pre-trained model을 불러와 state_dict()를 따로 저장한 후에
아래의 코드처럼 model의 class를 가져와서 load해보려합니다.

the_model = TheModelClass(*args, **kwargs)
이 부분에서 막혔는데 여기서 말하는 class가 train후 model과 함께 저장되는 trainer_states를 얘기하는걸까요?

맞다면 model 저장할때 함께 저장되는 trainer_states를 받을 수 있을까요??

감사합니다!

sooftware / kospeech Goto Github PK

kospeech's Introduction

I'm Soohwan Kim

Career

Service

kospeech's People

Contributors

Stargazers

Watchers

Forkers

kospeech's Issues

Check Fixed Code Here : models.speller.forward()

Reproduction code

Environment

Bug on models.speller.forward()

Recommend Projects

Recommend Topics

Recommend Org