Giter Site home page Giter Site logo

ctc-attention-mispronunciation's People

Contributors

cageyoko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ctc-attention-mispronunciation's Issues

Why use separate key/value for the text embedding?

Hi, thank you so much for sharing your paper and code, it has been enjoyable to read and experiment with. In the paper/code, the value is the output of the BiLSTM and the key is the value passed through a linear layer. Was there a specific reason you chose to use a different key and value here? Did you experiment with different combinations? (example: using the output of the linear layer as both key and value, or using output of the BiLSTM as both key and value). Thanks again!

请问有没有尝试过将 RNN 替换成 Multi-attention?

感谢你们的工作,我有点疑问想请教一下,请问你们尝过用 Multi-attention 替换过 RNN 吗?我试了一下,就是原始的 cnn + rnn + ctc 替换成 cnn + multi-head attention + ctc,PER 有 40%,不知道哪里出了问题,所以想请教你们一下。

Question about Data Augmentation

Hi. I have a question about your data augmentation strategy.
Did you use data augmentation in training only? Or when infer with test set, you augment canonical phoneme too or keep original canonical phoneme?

A question about the data processing

Run the model

As I read, step 2 and 3 is model training, and step 4 tests the ctc decoding. So, how can I run the whole model?

Augmentation method is unclear

I could not find any code how you are doing augmentation. I also read the paper but it seems not clear to me. Could you clarify augmentation strategy?

Usage difficulties

Hi, i followed the steps in your Usage section, but i'm not sure what to do next. Could you please give more details? How should i train your model?

Can't find fbank.scp

When running run.sh, I got this problem:

Traceback (most recent call last):
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/steps/train_ctc.py", line 278, in
main(conf)
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/steps/train_ctc.py", line 108, in main
train_dataset = SpeechDataset(vocab, opts.train_scp_path, opts.train_lab_path,opts.train_trans_path, opts, True)
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/./utils/data_loader.py", line 77, in init
self.process_feature_label()
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/./utils/data_loader.py", line 82, in process_feature_label
with open(self.scp_path, 'r') as rf:
FileNotFoundError: [Errno 2] No such file or directory: 'data/train/fbank.scp'

Where can I find this file?

Running Issue

Hi, I have tried to run both models and get the same error during Acoustic Model (CTC) Training

Start training epoch: 1, learning_rate: 0.00100
Epoch = 1, step = 50, cur_loss = 93.3737, total_loss = 93.3737, total_wer = 0.7058
Epoch 1 Train done, total_loss: 72.1015, total_wer: 0.5574
Traceback (most recent call last):
File "steps/train_ctc.py", line 263, in
main(conf)
File "steps/train_ctc.py", line 192, in main
acc, dev_loss = run_epoch(count, model, dev_loader, loss_fn, device, optimizer=None, print_every=opts.verbose_step, is_training=False)
File "steps/train_ctc.py", line 70, in run_epoch
average_loss = total_loss / (i+1)
UnboundLocalError: local variable 'i' referenced before assignment

The only changes I have made to your work were:

  • Dataset and kaldi root paths
  • Changed (cuda:1) to (cuda:0) as I am only using 1 GPU

Do you have any ideas as to where am I going wrong?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.