cageyoko / ctc-attention-mispronunciation Goto Github PK

View Code? Open in Web Editor NEW

56.0 56.0 21.0 538 KB

A Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augment Techniques

Python 83.42% Shell 5.22% Perl 11.36%

ctc-attention-mispronunciation's People

Contributors

Stargazers

Watchers

ctc-attention-mispronunciation's Issues

Why use separate key/value for the text embedding?

Hi, thank you so much for sharing your paper and code, it has been enjoyable to read and experiment with. In the paper/code, the value is the output of the BiLSTM and the key is the value passed through a linear layer. Was there a specific reason you chose to use a different key and value here? Did you experiment with different combinations? (example: using the output of the linear layer as both key and value, or using output of the BiLSTM as both key and value). Thanks again!

does the code include MD&D evaluation implementation?

Hello,

Does this code include the MD&D evaluation implementation?, i.e. these shown in table 3 in the paper (True Accept, True rejection, etc.).

Thanks.

请问有没有尝试过将 RNN 替换成 Multi-attention?

感谢你们的工作，我有点疑问想请教一下，请问你们尝过用 Multi-attention 替换过 RNN 吗？我试了一下，就是原始的 cnn + rnn + ctc 替换成 cnn + multi-head attention + ctc，PER 有 40%，不知道哪里出了问题，所以想请教你们一下。

does there any Chinese MDD baseline open source？

Question about Data Augmentation

Hi. I have a question about your data augmentation strategy.
Did you use data augmentation in training only? Or when infer with test set, you augment canonical phoneme too or keep original canonical phoneme?

关于音素规则

A question about the data processing

Hi,

There are some mismatch between your phoneme map and phoneme map in the kaldi.

The different as follow:

https://github.com/kaldi-asr/kaldi/blob/master/egs/timit/s5/conf/phones.60-48-39.map#L4
https://github.com/cageyoko/CTC-Attention-Mispronunciation/blob/master/egs/attention_aug/conf/phones.60-48-39.map#L4

https://github.com/kaldi-asr/kaldi/blob/master/egs/timit/s5/conf/phones.60-48-39.map#L16
https://github.com/cageyoko/CTC-Attention-Mispronunciation/blob/master/egs/attention_aug/conf/phones.60-48-39.map#L16

https://github.com/kaldi-asr/kaldi/blob/master/egs/timit/s5/conf/phones.60-48-39.map#L61
https://github.com/cageyoko/CTC-Attention-Mispronunciation/blob/master/egs/attention_aug/conf/phones.60-48-39.map#L61

Any suggestion?

--2022/5/22--
Oh, I got it.
The reason for this is the phoneme table that is used in l2-arctic.
Is that correct?

Run the model

As I read, step 2 and 3 is model training, and step 4 tests the ctc decoding. So, how can I run the whole model?

Augmentation method is unclear

I could not find any code how you are doing augmentation. I also read the paper but it seems not clear to me. Could you clarify augmentation strategy?

Usage difficulties

Hi, i followed the steps in your Usage section, but i'm not sure what to do next. Could you please give more details? How should i train your model?

Can't find fbank.scp

When running run.sh, I got this problem:

Traceback (most recent call last):
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/steps/train_ctc.py", line 278, in
main(conf)
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/steps/train_ctc.py", line 108, in main
train_dataset = SpeechDataset(vocab, opts.train_scp_path, opts.train_lab_path,opts.train_trans_path, opts, True)
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/./utils/data_loader.py", line 77, in init
self.process_feature_label()
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/./utils/data_loader.py", line 82, in process_feature_label
with open(self.scp_path, 'r') as rf:
FileNotFoundError: [Errno 2] No such file or directory: 'data/train/fbank.scp'

Where can I find this file?

Running Issue

Hi, I have tried to run both models and get the same error during Acoustic Model (CTC) Training

Start training epoch: 1, learning_rate: 0.00100
Epoch = 1, step = 50, cur_loss = 93.3737, total_loss = 93.3737, total_wer = 0.7058
Epoch 1 Train done, total_loss: 72.1015, total_wer: 0.5574
Traceback (most recent call last):
File "steps/train_ctc.py", line 263, in
main(conf)
File "steps/train_ctc.py", line 192, in main
acc, dev_loss = run_epoch(count, model, dev_loader, loss_fn, device, optimizer=None, print_every=opts.verbose_step, is_training=False)
File "steps/train_ctc.py", line 70, in run_epoch
average_loss = total_loss / (i+1)
UnboundLocalError: local variable 'i' referenced before assignment

The only changes I have made to your work were:

Dataset and kaldi root paths
Changed (cuda:1) to (cuda:0) as I am only using 1 GPU

Do you have any ideas as to where am I going wrong?

Thanks

cageyoko / ctc-attention-mispronunciation Goto Github PK

ctc-attention-mispronunciation's People

Contributors

Stargazers

Watchers

Forkers

ctc-attention-mispronunciation's Issues

Recommend Projects

Recommend Topics

Recommend Org