Hello, I have a problem when I stop then restart training ... When I stop trainin

What is the name of the reloaded file? If it's something like <code class="notranslate

Problem with xe_costs_en_fr_en loss when reloading checkpoint about unsupervisedmt HOT 4 CLOSED

facebookresearch commented on July 19, 2024

Problem with xe_costs_en_fr_en loss when reloading checkpoint

from unsupervisedmt.

Comments (4)

pbordes commented on July 19, 2024 1

Ok I got it! thank you very much

from unsupervisedmt.

glample commented on July 19, 2024

Hi,

How are you doing the reload? Are you reloading a pretrained model? Or a checkpoint.pth file? In the first case, what you observe is possible (even though the difference seems surprisingly large), since the optimizer weights will also be reset, but in the case of a checkpoint, the loss should be the same before and after the training interruption, since the optimizer is also reloaded.

Maybe could you try to print the weights before and after reloading (both for the model and the optimizer) to see if they are identical?

from unsupervisedmt.

pbordes commented on July 19, 2024

Hello,
Thank you for your quick answer ! I trained entirely a model (until the stopping criterion) and now I am trying to reload it: do you mean that when a model is trained until the end then the optimizer weights are not saved ? However, the penultimate function called in the main.py is end_epoch() which calls save_checkpoint(), and this function saves optimizer weights ...

from unsupervisedmt.

glample commented on July 19, 2024

What is the name of the reloaded file? If it's something like best-bleu_en_fr.pth it will only contain the model. The checkpoint.pth file is to reload experiments that have crashed or have been interrupted for various reasons, and for which you want to resume the training.

from unsupervisedmt.

Recommend Projects

Problem with xe_costs_en_fr_en loss when reloading checkpoint about unsupervisedmt HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent