cageyoko / ctc-attention-mispronunciation Goto Github PK
View Code? Open in Web Editor NEWA Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augment Techniques
A Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augment Techniques
Hi, thank you so much for sharing your paper and code, it has been enjoyable to read and experiment with. In the paper/code, the value
is the output of the BiLSTM and the key
is the value
passed through a linear layer. Was there a specific reason you chose to use a different key and value here? Did you experiment with different combinations? (example: using the output of the linear layer as both key and value, or using output of the BiLSTM as both key and value). Thanks again!
Hello,
Does this code include the MD&D evaluation implementation?, i.e. these shown in table 3 in the paper (True Accept, True rejection, etc.).
Thanks.
感谢你们的工作,我有点疑问想请教一下,请问你们尝过用 Multi-attention 替换过 RNN 吗?我试了一下,就是原始的 cnn + rnn + ctc 替换成 cnn + multi-head attention + ctc,PER 有 40%,不知道哪里出了问题,所以想请教你们一下。
Hi. I have a question about your data augmentation strategy.
Did you use data augmentation in training only? Or when infer with test set, you augment canonical phoneme too or keep original canonical phoneme?
Hi,
There are some mismatch between your phoneme map and phoneme map in the kaldi.
The different as follow:
https://github.com/kaldi-asr/kaldi/blob/master/egs/timit/s5/conf/phones.60-48-39.map#L4
https://github.com/cageyoko/CTC-Attention-Mispronunciation/blob/master/egs/attention_aug/conf/phones.60-48-39.map#L4
https://github.com/kaldi-asr/kaldi/blob/master/egs/timit/s5/conf/phones.60-48-39.map#L16
https://github.com/cageyoko/CTC-Attention-Mispronunciation/blob/master/egs/attention_aug/conf/phones.60-48-39.map#L16
https://github.com/kaldi-asr/kaldi/blob/master/egs/timit/s5/conf/phones.60-48-39.map#L61
https://github.com/cageyoko/CTC-Attention-Mispronunciation/blob/master/egs/attention_aug/conf/phones.60-48-39.map#L61
Any suggestion?
--2022/5/22--
Oh, I got it.
The reason for this is the phoneme table that is used in l2-arctic.
Is that correct?
As I read, step 2 and 3 is model training, and step 4 tests the ctc decoding. So, how can I run the whole model?
I could not find any code how you are doing augmentation. I also read the paper but it seems not clear to me. Could you clarify augmentation strategy?
Hi, i followed the steps in your Usage section, but i'm not sure what to do next. Could you please give more details? How should i train your model?
When running run.sh, I got this problem:
Traceback (most recent call last):
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/steps/train_ctc.py", line 278, in
main(conf)
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/steps/train_ctc.py", line 108, in main
train_dataset = SpeechDataset(vocab, opts.train_scp_path, opts.train_lab_path,opts.train_trans_path, opts, True)
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/./utils/data_loader.py", line 77, in init
self.process_feature_label()
File "/content/CTC-Attention-Mispronunciation/egs/attention_aug/./utils/data_loader.py", line 82, in process_feature_label
with open(self.scp_path, 'r') as rf:
FileNotFoundError: [Errno 2] No such file or directory: 'data/train/fbank.scp'
Where can I find this file?
Hi, I have tried to run both models and get the same error during Acoustic Model (CTC) Training
Start training epoch: 1, learning_rate: 0.00100
Epoch = 1, step = 50, cur_loss = 93.3737, total_loss = 93.3737, total_wer = 0.7058
Epoch 1 Train done, total_loss: 72.1015, total_wer: 0.5574
Traceback (most recent call last):
File "steps/train_ctc.py", line 263, in
main(conf)
File "steps/train_ctc.py", line 192, in main
acc, dev_loss = run_epoch(count, model, dev_loader, loss_fn, device, optimizer=None, print_every=opts.verbose_step, is_training=False)
File "steps/train_ctc.py", line 70, in run_epoch
average_loss = total_loss / (i+1)
UnboundLocalError: local variable 'i' referenced before assignment
The only changes I have made to your work were:
Do you have any ideas as to where am I going wrong?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.