Hello, I follow the configuration of your paper and used GPT-2 (705M, eval_loss: 3.75)

Unable to reproduce？？？ about babyllama HOT 5 OPEN

haiduo commented on August 18, 2024

Unable to reproduce？？？

from babyllama.

Comments (5)

shawnricecake commented on August 18, 2024 1

Sorry for the late reply.

The BabyLM dataset can be downloaded from the webpage of the challenge: https://babylm.github.io/

We haven’t evaluated the model on other datasets, but since we rely on the HuggingFace Transformers library, I would expect it to be widely-compatible with eval frameworks.

Thanks for your reply! That truly helps!

from babyllama.

timinar commented on August 18, 2024

Hello! This is expected. In the distillation pertaining, we use the loss that combines the usual cross-entropy with a KL divergence between the student's and teachers' probability distributions $L = \alpha L_{CE} + (1-\alpha) L_{KL}$. This second term is term is pretty large, so the resulting loss is completely off compared to the cross-entropy one. Yet, the model trains. Ideally, we should have computed and logged the cross-entropy on eval. Alternatively, one can load the trained model and eval it on the same evaluation dataset.

from babyllama.

haiduo commented on August 18, 2024

ello! This is expected. In the distillation pertaining, we use the loss that combines the usual cross-entropy with a KL divergence between the student's and teachers' probability distributions L=αLCE+(1−α)LKL. This second term is term is pretty large, so the resulting loss is completely off compared to the cross-entropy one. Yet, the model trains. Ideally, we should have computed and logged the cross-entropy on eval. Alternatively, one can load the trained model and eval it on the same evaluation dataset.

Okay, thank you for your reply. I will try their actual test level on the evaluation dataset later.

from babyllama.

shawnricecake commented on August 18, 2024

Hi,

How can I get the babyllama dataset?

Is there any code for evaluation on other datasets as reported in the paper?

Thanks

from babyllama.

JLTastet commented on August 18, 2024

Sorry for the late reply.

The BabyLM dataset can be downloaded from the webpage of the challenge: https://babylm.github.io/

We haven’t evaluated the model on other datasets, but since we rely on the HuggingFace Transformers library, I would expect it to be widely-compatible with eval frameworks.

from babyllama.

Unable to reproduce？？？ about babyllama HOT 5 OPEN

Comments (5)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent