When I finetune llama-7b on gsm-8k with different finetuning methods. I compared the t

Smaller test/val loss but lower evaluation accuracy about federatedscope HOT 3 OPEN

shuangyichen commented on July 19, 2024

Smaller test/val loss but lower evaluation accuracy

from federatedscope.

Comments (3)

qbc2016 commented on July 19, 2024

Hello! It may be related to the scale of your dataset partition. If the test/val dataset is too small, then the loss will be unstable.
On the other hand, the evaluation accuracy only depends on one exact value, which is parsed from the generated text, but the val/test loss is calculated among all the tokens the model generates.
We also find that the validation loss may not be a reliable indicator of the generalization performance. For more details, please refer to our paper.
Best regards,

from federatedscope.

shuangyichen commented on July 19, 2024

Hello! It may be related to the scale of your dataset partition. If the test/val dataset is too small, then the loss will be unstable. On the other hand, the evaluation accuracy only depends on one exact value, which is parsed from the generated text, but the val/test loss is calculated among all the tokens the model generates. We also find that the validation loss may not be a reliable indicator of the generalization performance. For more details, please refer to our paper. Best regards,

I wonder the phenomenon discussed in your paper is just in low-fidelity scenario or in general FL?

from federatedscope.

qbc2016 commented on July 19, 2024

Hello! It may be related to the scale of your dataset partition. If the test/val dataset is too small, then the loss will be unstable. On the other hand, the evaluation accuracy only depends on one exact value, which is parsed from the generated text, but the val/test loss is calculated among all the tokens the model generates. We also find that the validation loss may not be a reliable indicator of the generalization performance. For more details, please refer to our paper. Best regards,

I wonder the phenomenon discussed in your paper is just in low-fidelity scenario or in general FL?

In the paper, what we observe is in a low-fidelity scenario, but finetuning LLM in general FL, it may be interesting to investigate the relationship between val/test loss and the final evaluation accuracy. I'm not sure there's been a study on this。

from federatedscope.

Recommend Projects

Smaller test/val loss but lower evaluation accuracy about federatedscope HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent