I have seen several distillations for different single languages for distil-whisper (l

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

I too have the same question. <a class="user-mention notranslate" data-hovercard-t

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

[Question] Can we distill for multiple langauges for distil-small-whisper about distil-whisper HOT 3 OPEN

Killshot667 commented on July 3, 2024

[Question] Can we distill for multiple langauges for distil-small-whisper

from distil-whisper.

Comments (3)

eustlb commented on July 3, 2024 1

Hey @Killshot667! Thanks for raising this interesting point. Indeed, distillation has, for the moment, been targeted at single languages.

For distillation, the approach was initially to shrink the model as much as possible while maximizing its performance by training a smaller decoder on a targeted language. The idea is to trade the multilingual capacities of the 32 layers of the decoder for size and speed improvement brought by a smaller decoder (therefore with smaller learning capacities). In this context, two layers appeared to be Pareto optimal. Were we to train on a multilingual dataset, more decoder layers might be needed to enhance learning capacities. Such an adaptation of the student model’s decoder layers can be easily done by changing --decoder_layers when initializing.

Secondly, note there is nothing restraining a distilled model from having multilingual transcription capacities. First, the encoder is identical to Whisper’s, so robustness in creating a representation of speech for different languages remains unchanged. Secondly, when initializing the student model, we keep Whisper’s vocabulary and start from Whisper input embeddings, coming with inherent multilingual tokens. To this extent, the only thing restraining distil-large-v3 from being multilingual is the dataset it has been distilled on. You could perfectly train, for example, a 4-decoder-layer distilled model on European languages (easily done by pseudo-labeling each set with the correct --language flag as explained in language-mixing). Actually, language-mixing experiments showed that mixing close languages could improve the model’s performance.

from distil-whisper.

bil-ash commented on July 3, 2024

I too have the same question.
@sanchit-gandhi Please try distilling whisper-small on kathbath dataset and share the results.

from distil-whisper.

sanchit-gandhi commented on July 3, 2024

Hey @Killshot667 - that's a great question, and super sorry for the late reply here! I'll defer to @eustlb, who has been running some preliminary experiments on distilling Whisper jointly for French and Spanish. You can read about the initial results and how to reproduce them on the README here: https://github.com/huggingface/distil-whisper/tree/main/training#3-language-mixing

from distil-whisper.

Recommend Projects

[Question] Can we distill for multiple langauges for distil-small-whisper about distil-whisper HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent