Giter Site home page Giter Site logo

Unable to reproduce??? about babyllama HOT 5 OPEN

haiduo avatar haiduo commented on August 18, 2024
Unable to reproduce???

from babyllama.

Comments (5)

shawnricecake avatar shawnricecake commented on August 18, 2024 1

Sorry for the late reply.

The BabyLM dataset can be downloaded from the webpage of the challenge: https://babylm.github.io/

We haven’t evaluated the model on other datasets, but since we rely on the HuggingFace Transformers library, I would expect it to be widely-compatible with eval frameworks.

Thanks for your reply! That truly helps!

from babyllama.

timinar avatar timinar commented on August 18, 2024

Hello! This is expected. In the distillation pertaining, we use the loss that combines the usual cross-entropy with a KL divergence between the student's and teachers' probability distributions $L = \alpha L_{CE} + (1-\alpha) L_{KL}$. This second term is term is pretty large, so the resulting loss is completely off compared to the cross-entropy one. Yet, the model trains. Ideally, we should have computed and logged the cross-entropy on eval. Alternatively, one can load the trained model and eval it on the same evaluation dataset.

from babyllama.

haiduo avatar haiduo commented on August 18, 2024

ello! This is expected. In the distillation pertaining, we use the loss that combines the usual cross-entropy with a KL divergence between the student's and teachers' probability distributions L=αLCE+(1−α)LKL. This second term is term is pretty large, so the resulting loss is completely off compared to the cross-entropy one. Yet, the model trains. Ideally, we should have computed and logged the cross-entropy on eval. Alternatively, one can load the trained model and eval it on the same evaluation dataset.

Okay, thank you for your reply. I will try their actual test level on the evaluation dataset later.

from babyllama.

shawnricecake avatar shawnricecake commented on August 18, 2024

Hi,

How can I get the babyllama dataset?

Is there any code for evaluation on other datasets as reported in the paper?

Thanks

from babyllama.

JLTastet avatar JLTastet commented on August 18, 2024

Sorry for the late reply.

The BabyLM dataset can be downloaded from the webpage of the challenge: https://babylm.github.io/

We haven’t evaluated the model on other datasets, but since we rely on the HuggingFace Transformers library, I would expect it to be widely-compatible with eval frameworks.

from babyllama.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.