🐛 Bug I'm not quite sure this is a bug, but after calibrating my

I digged a bit deeper into the loss curves in <a href="https://github.com/h2oai/h2o-ll

[BUG] Possible bug in validation perplexity metric for DPO? about h2o-llmstudio HOT 2 CLOSED

tmostak commented on June 12, 2024

[BUG] Possible bug in validation perplexity metric for DPO?

from h2o-llmstudio.

Comments (2)

maxjeblick commented on June 12, 2024

Thanks a lot for posting, interesting findings!

I've checked Perplexity implementation, code-wise I haven't found any issues.
For Dpo, validation loss and perplexity may exhibit different behavior, as dpo calculation is using
policy_chosen_logps - policy_rejected_logps - (reference_chosen_logps - reference_rejected_logps)
whereas Perplexity is using chosen_logits, only.

It is probably a good idea to add additional train/validation metrics (that will be logged to neptune) such as CE loss to better track the experiment.
I've created this branch where SampleAveragedCrossEntropyLoss is used as a validation loss and ran some experiments on it. So far, val loss is in sync with Perplexity.

from h2o-llmstudio.

maxjeblick commented on June 12, 2024

I digged a bit deeper into the loss curves in this branch; cross entropy is logged for both accepted as well as rejected samples; alongside with the corresponding perplexity.

Regarding the different behavior of loss vs. perplexity, my explanation for this is that evaluation samples the model struggles to predict can disproportionately impact the overall perplexity.

As an example, suppose we have a dataset with 4 samples where the third sample is out-of-distribution of the SFT model.

After epoch 1, suppose the model has the following cross entropy loss for each sample: (1, 1, 9, 1) (third answer is hard to predict).
Mean cross-entropy (cross-entropy per sample / num_samples) will be 12/4 = 3
Mean perplexity (perplexity per sample / num_samples) will be mean([exp(1), exp(1), exp(9), exp(1)]) ~ 2027.8.
After epoch 2, suppose the model has the following cross entropy loss for each sample: (4, 4, 4, 4) (model adapts to third sample at the cost of predicting the other samples worse)
Mean cross-entropy will be 16/4 = 4
Mean perplexity will be exp(4) / 4 ~ 13.6

My assumption is that this explains the behavior observed (and it is not caused by a bug).

from h2o-llmstudio.

Recommend Projects

[BUG] Possible bug in validation perplexity metric for DPO? about h2o-llmstudio HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent