Giter Site home page Giter Site logo

实验复现结果不一致 about vac_cslr HOT 16 OPEN

wljcode avatar wljcode commented on August 23, 2024
实验复现结果不一致

from vac_cslr.

Comments (16)

ycmin95 avatar ycmin95 commented on August 23, 2024 1

@wljcode
Have you successfully reimplemented the experimental results? I checked the relevant logs and found that you adopt the load_weights to continue training rather than load_checkpoints, the former only load the model weights and the latter will load all training relevant parameters. It is expected to adopt load_checkpoints to continue training.

from vac_cslr.

wljcode avatar wljcode commented on August 23, 2024

第一份是使用VAC算法的log文件
log.txt
第二份是使用baseline的log文件
log.txt

from vac_cslr.

ycmin95 avatar ycmin95 commented on August 23, 2024

@wljcode
Thanks for your attention to our work. It seems like you used batch size=1, which may affect the robustness of the model. Besides, the learning rate does not decay during training? Perhaps because there may exist some bugs in checkpoint loading, I will check this later.

Relevant logs are uploaded for comparson.

baseline.txt
baseline_bn.txt
baseline_VAC.txt

from vac_cslr.

wljcode avatar wljcode commented on August 23, 2024

Thank you for your reply. Due to the limitation of GPU memory, we did not continue the experiment reproduction work recently. We will complete your work later when the equipment is ready !

from vac_cslr.

wljcode avatar wljcode commented on August 23, 2024

Thank you for your reply. Due to the limitation of GPU memory, we did not continue the experiment reproduction work recently. We will complete your work later when the equipment is ready !

from vac_cslr.

sunke123 avatar sunke123 commented on August 23, 2024

Hi, @ycmin95 , thanks for your great work.
I try to reproduce the work recently. The final result is 0.4% worse than yours.
Here is my training log.
log.txt
After 70th epoch, the performance cannot be improved as yours.
Besides, I find "label_smoothing = 0.1" in your log, but not in the released code.
Could you provide some advice?

from vac_cslr.

ycmin95 avatar ycmin95 commented on August 23, 2024

Hi, @sunke123, thanks for your attention to our work.
We will explain this performance gap in our next update, perhaps in two weeks,
which can achieve better performance (about 20% WER) with fewer training epochs.
You can conduct further experiments on this codebase, the update won't change the network structure and the training process.

The parameter label_smoothing is adopted in our early experiment about iterative training and I forget to delete this parameter, I will correct it in the next update.

from vac_cslr.

sunke123 avatar sunke123 commented on August 23, 2024

@ycmin95
Cooooool!
Thanks for your reply.
Looking forward to that~

from vac_cslr.

ycmin95 avatar ycmin95 commented on August 23, 2024

Hi, @sunke123,
the code has been updated~

from vac_cslr.

herochen7372 avatar herochen7372 commented on August 23, 2024

你好,请问我下载了代码重新训练了一下,但几轮下来DEV wer依旧是100%.我把batchsize调成1,lr调成0.000010.可以给点意见吗,谢谢.

from vac_cslr.

ycmin95 avatar ycmin95 commented on August 23, 2024

@herochen7372
You can first check whether the evaluation script runs as expected with the provided pretrained model, and than check whether the loss decreases as the iteration progresses.

from vac_cslr.

herochen7372 avatar herochen7372 commented on August 23, 2024

@ycmin95
Thanks for your reply.

from vac_cslr.

kido1412y2y avatar kido1412y2y commented on August 23, 2024

@wljcode Have you successfully reimplemented the experimental results? I checked the relevant logs and found that you adopt the load_weights to continue training rather than load_checkpoints, the former only load the model weights and the latter will load all training relevant parameters. It is expected to adopt load_checkpoints to continue training.
@ycmin95
Hello author, I have encountered the same problem. Can you provide more detailed information on how to solve this problem? Sorry, I didn't understand the method here. Thank you very much. I only have a 3060 GPU, so my batchsize = 1.

I noticed in your log that there are Dev WER and Test WER for each epoch of training, but mine is only Dev WER.

Looking forward to your help.
dev.txt
log.txt

from vac_cslr.

ycmin95 avatar ycmin95 commented on August 23, 2024

It seems that you only train the baseline without the proposed VAC or SMKD, please follow the Readme.md to set the configuration file. We remove the evaluation on the test set during training for efficiency, you can modify this process in the main.py.

from vac_cslr.

kido1412y2y avatar kido1412y2y commented on August 23, 2024

Hello, @ycmin95.
After configuring the settings according to the readme, I ran 40 epoch, but the best result achieved was only 32.3%. Could you please let me know if I might have missed any settings?
log_SMKD_no_ConvCTC.txt
dev_SMKD_no_ConvCTC.txt

I've noticed "# ConvCTC: 1.0" in the baseline.yaml file. I added ConvCTC: 1.0 and retrained 80 epoch, but the results were even worse.
log_SMKD_ConvCTC.txt
dev_SMKD_ConvCTC.txt

from vac_cslr.

ycmin95 avatar ycmin95 commented on August 23, 2024

Hi, @kido1412y2y
Can you report the evaluation results with a batch size larger than 1? I never run experiments with batch size of 1, and not sure the influence of it.

from vac_cslr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.