Giter Site home page Giter Site logo

Comments (3)

eustlb avatar eustlb commented on July 3, 2024 1

Hey @Killshot667! Thanks for raising this interesting point. Indeed, distillation has, for the moment, been targeted at single languages.

For distillation, the approach was initially to shrink the model as much as possible while maximizing its performance by training a smaller decoder on a targeted language. The idea is to trade the multilingual capacities of the 32 layers of the decoder for size and speed improvement brought by a smaller decoder (therefore with smaller learning capacities). In this context, two layers appeared to be Pareto optimal. Were we to train on a multilingual dataset, more decoder layers might be needed to enhance learning capacities. Such an adaptation of the student model’s decoder layers can be easily done by changing --decoder_layers when initializing.

Secondly, note there is nothing restraining a distilled model from having multilingual transcription capacities. First, the encoder is identical to Whisper’s, so robustness in creating a representation of speech for different languages remains unchanged. Secondly, when initializing the student model, we keep Whisper’s vocabulary and start from Whisper input embeddings, coming with inherent multilingual tokens. To this extent, the only thing restraining distil-large-v3 from being multilingual is the dataset it has been distilled on. You could perfectly train, for example, a 4-decoder-layer distilled model on European languages (easily done by pseudo-labeling each set with the correct --language flag as explained in language-mixing). Actually, language-mixing experiments showed that mixing close languages could improve the model’s performance.

from distil-whisper.

bil-ash avatar bil-ash commented on July 3, 2024

I too have the same question.
@sanchit-gandhi Please try distilling whisper-small on kathbath dataset and share the results.

from distil-whisper.

sanchit-gandhi avatar sanchit-gandhi commented on July 3, 2024

Hey @Killshot667 - that's a great question, and super sorry for the late reply here! I'll defer to @eustlb, who has been running some preliminary experiments on distilling Whisper jointly for French and Spanish. You can read about the initial results and how to reproduce them on the README here: https://github.com/huggingface/distil-whisper/tree/main/training#3-language-mixing

from distil-whisper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.