Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I'm not sure the issue. If you don't mind, could you rerun and add <code class="notran

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Is there something wrong with the code? <a class="user-mention notranslate" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Accuracy could not match with the log when load_model about t-few HOT 10 OPEN

r-three commented on September 17, 2024

Accuracy could not match with the log when load_model

from t-few.

Comments (10)

dptam commented on September 17, 2024

Hello,

To clarify, are you loading the model at step 67? Is the performance of the model when you load the checkpoint 53? And is the performance of the checkpoint in the log 58?

from t-few.

CaffreyR commented on September 17, 2024

Hi @dptam, the step is actually 75. As we see from the log here, in line 20(epoch 19), the log is 0.5812

And when I enter this code, it run accuracy is 0.5848

BTW step 79 is 0.5631

Same thing in COPA dataset , line 221 is 0.62

So when I tried to run step 883 887 it result is 0.54 , and 879 is 0.55

from t-few.

dptam commented on September 17, 2024

I'm not sure the issue. If you don't mind, could you rerun and add self.global_step to the metrics dictionary here. This should output the global step in the log that matches the global step used to save the model just to make sure the line number corresponds to the correct ckpt.

from t-few.

CaffreyR commented on September 17, 2024

Hi @dptam , actually when I tried to run the finish.pt, it can not match the last accuracy in log.

from t-few.

CaffreyR commented on September 17, 2024

Is there something wrong with the code? @muqeeth @jmohta @HaokunLiu

from t-few.

CaffreyR commented on September 17, 2024

@dptam I have add global step as your suggestion, but it still can not match

from t-few.

HaokunLiu commented on September 17, 2024

What is in pl_test.py? Mind share with us what you have there?

from t-few.

dptam commented on September 17, 2024

Hello,

Thanks for rerunning the code. I'm still not sure why loading and rerunning the model doesn't match the log performance - could you share the command used to train the model?

Regarding the issue of finish.pt not matching the last accuracy in log, see #11 for more details why.

from t-few.

CaffreyR commented on September 17, 2024

Hi @HaokunLiu @dptam , actually pl_test is just a copy of train, except for loading method. See I was use both your save model method and checkpoing method of pytorch ligetning. See,

And I change a little bit in encoderdecoder.py

But here is the thing, the train command is as bellow

And the test code is as bellow, actually pl_train/test run the same result

And the log here, not use finish.pt but the 51 as suggestion of @dptam

from t-few.

dptam commented on September 17, 2024

Hi,

I tried to look into a bit and couldn't figure out the cause but found one issue for me at least(not sure if it will be the same for you). Sorry I don't have more time to look into it currently, but maybe you can.

When using t5-small and printing out self.model.lm_head.weight(), the norm is 94070 in the train_step function but 94072 in the predict function. This is due some precision issues when moving from CPU to GPU and one remedy was adding
self.weight = torch.clone(self.model.lm_head.weight).double().cuda().float() at the end of the init function for EncoderDecoder.py and adding self.model.lm_head.weight = torch.nn.Parameter(self.weight) at the beginning of the training_step function.

This causes the self.model.lm_head.weight() to consistently be 94070 in the train_step and predict function, but the accuracy from the log and from loading a validation checkpoint still do not match. I'm not sure why, but one potential further analysis is to look at the other weights of the model.

from t-few.

Accuracy could not match with the log when load_model about t-few HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent