Giter Site home page Giter Site logo

pytorchpipeline's People

Contributors

ghnreigns avatar ian-datature avatar jansky avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

pytorchpipeline's Issues

Weights are not saved on best epoch

I will open issues here from now on. Previous issues created on 9/10 Jan will be in the comments of the different commit versions.

11 Jan Issue: Weights are not saved on the best epoch based on the monitored_metric. I have taken a look at the source code of results.py and cannot find the error. It seems to me that results are updated once the new_result > old_result. However, I am not sure why the results are not saved and loaded for the best epoch. To reproduce the issue, just drop me a message @jansky.

A snippet can be seen as follows:

Training on Fold 1 and using tf_efficientnet_b2_ns

2021-01-10 18-12-48
LR: 0.001
[RESULT]: Training Epoch: 1 | Avg Validation Summary Loss: 0.086217 | Validation Accuracy: 0.981698 | Time Elapsed: 00:03:51
[RESULT]: Validation Epoch: 1 | Avg Validation Summary Loss: 0.084663 | Validation Accuracy: 0.982342 | Validation ROC: 0.726715 | MultiClass ROC: {0: 0.27328498476140206, 1: 0.7267150152385979} | Time Elapsed: 00:00:17
Adjusting learning rate of group 0 to 1.0000e-03.

2021-01-10 18-16-58
LR: 0.001
[RESULT]: Training Epoch: 2 | Avg Validation Summary Loss: 0.083393 | Validation Accuracy: 0.982377 | Time Elapsed: 00:03:50
[RESULT]: Validation Epoch: 2 | Avg Validation Summary Loss: 0.074522 | Validation Accuracy: 0.982342 | Validation ROC: 0.853385 | MultiClass ROC: {0: 0.14661487775637413, 1: 0.8533851222436258} | Time Elapsed: 00:00:17
Adjusting learning rate of group 0 to 3.0000e-04.

2021-01-10 18-21-07
LR: 0.0003
[RESULT]: Training Epoch: 3 | Avg Validation Summary Loss: 0.075850 | Validation Accuracy: 0.982377 | Time Elapsed: 00:03:52
[RESULT]: Validation Epoch: 3 | Avg Validation Summary Loss: 0.072389 | Validation Accuracy: 0.982342 | Validation ROC: 0.853426 | MultiClass ROC: {0: 0.14657548456903197, 1: 0.8534245154309681} | Time Elapsed: 00:00:17
Adjusting learning rate of group 0 to 3.0000e-04.

2021-01-10 18-25-17
LR: 0.0003
[RESULT]: Training Epoch: 4 | Avg Validation Summary Loss: 0.075004 | Validation Accuracy: 0.982377 | Time Elapsed: 00:03:53
[RESULT]: Validation Epoch: 4 | Avg Validation Summary Loss: 0.072246 | Validation Accuracy: 0.982342 | Validation ROC: 0.865661 | MultiClass ROC: {0: 0.1343386474743058, 1: 0.8656613525256942} | Time Elapsed: 00:00:17
Adjusting learning rate of group 0 to 9.0000e-05.

2021-01-10 18-29-29
LR: 8.999999999999999e-05
[RESULT]: Training Epoch: 5 | Avg Validation Summary Loss: 0.070796 | Validation Accuracy: 0.982377 | Time Elapsed: 00:03:52
[RESULT]: Validation Epoch: 5 | Avg Validation Summary Loss: 0.071158 | Validation Accuracy: 0.982342 | Validation ROC: 0.873397 | MultiClass ROC: {0: 0.12660379513966855, 1: 0.8733962048603314} | Time Elapsed: 00:00:17
Adjusting learning rate of group 0 to 9.0000e-05.

2021-01-10 18-33-39
LR: 8.999999999999999e-05
[RESULT]: Training Epoch: 6 | Avg Validation Summary Loss: 0.067821 | Validation Accuracy: 0.982340 | Time Elapsed: 00:03:51
[RESULT]: Validation Epoch: 6 | Avg Validation Summary Loss: 0.069738 | Validation Accuracy: 0.982342 | Validation ROC: 0.877959 | MultiClass ROC: {0: 0.1220394378329545, 1: 0.8779605621670455} | Time Elapsed: 00:00:17
Adjusting learning rate of group 0 to 2.7000e-05.

2021-01-10 18-37-49
LR: 2.6999999999999996e-05
[RESULT]: Training Epoch: 7 | Avg Validation Summary Loss: 0.067316 | Validation Accuracy: 0.982377 | Time Elapsed: 00:03:49
[RESULT]: Validation Epoch: 7 | Avg Validation Summary Loss: 0.068827 | Validation Accuracy: 0.982342 | Validation ROC: 0.881831 | MultiClass ROC: {0: 0.11816905717658521, 1: 0.8818309428234148} | Time Elapsed: 00:00:17
Adjusting learning rate of group 0 to 2.7000e-05.

2021-01-10 18-41-57
LR: 2.6999999999999996e-05
[RESULT]: Training Epoch: 8 | Avg Validation Summary Loss: 0.064954 | Validation Accuracy: 0.982340 | Time Elapsed: 00:03:52
[RESULT]: Validation Epoch: 8 | Avg Validation Summary Loss: 0.069614 | Validation Accuracy: 0.982342 | Validation ROC: 0.879680 | MultiClass ROC: {0: 0.12031926865234593, 1: 0.879680731347654} | Time Elapsed: 00:00:18
Adjusting learning rate of group 0 to 8.1000e-06.



OOF Score for Fold 1: 0.879680731347654

The OOF score for each epoch should be merely the highest monitored metrics, which happens at epoch 7. This is further confirmed to only happen in Fold 1 and 4, where coincidentally, the last epoch is not the best result - and for Fold 2, 3 and 5, the last epoch turns out to be the best epoch. Can only hypothesize that the weights are saved on the last epoch.

updated codes

@jansky 16Jan updates, codes were updated so that one can choose to use AMP in PyTorch or not, main changes are detailed in the update remarks today.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.