Question about 2018-asr-librispeech dev = get_dataset("dev", subset=3000) about returnn-experiments HOT 2 CLOSED

rwth-i6 commented on May 25, 2024

Question about 2018-asr-librispeech dev = get_dataset("dev", subset=3000)

from returnn-experiments.

Comments (2)

albertz commented on May 25, 2024

I have encountered this problem, when I add more audio corpus in the training and keep the subset=3000. The training loss went 'nan' and it will never converge. But when I enlarge the subset to 10000, the problem disappeared. The small dev subset make the model break.

The dev set has no influence on the training except on learning rate scheduling. But if this happens early (I assume so, you did not tell), then the learning rates are fixed (e.g. warmup), so it has no effect.

I assume this is more a noise effect, due to non-deterministic training in general.

Note that we have a newer, more stable baseline config here. It might occur less or not at all with this.

We also have a newer simpler data preparation pipeline here, although that's maybe not too important.

Which TF version do you use? We observed that some combination of TF + CUDA versions are more unstable than others. In general, I would recommend TF 2.3.

And when I set the subset to a fix number , the toolkit randomly select some audio forming the the dev-set. Then the dev-set will be the same for all epoches during the training ? Is it?

If you have the same random seed for all epochs (via fixed_random_seed), then yes, it's exactly the same subset for all epochs.

from returnn-experiments.

christophmluscher commented on May 25, 2024

This seems resolved.

from returnn-experiments.

Recommend Projects

Question about 2018-asr-librispeech dev = get_dataset("dev", subset=3000) about returnn-experiments HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent