Giter Site home page Giter Site logo

Comments (5)

albertz avatar albertz commented on May 24, 2024

It seems a very related question was also asked in #42.
I have not tried unidir LSTMs in the encoder yet, so I don't know. You probably should play around with all the available hyper params, e.g.:

  • Learning rate (initial at warmup, and highest learning rate after warmup)
  • Learning rate warmup length (num epochs)
  • Pretraining. E.g. start number of layers (try 2). Initial time reduction (try increasing it, e.g. 6, 8, 16, or even 32). Try make it longer (more repetitions). Etc.
  • No or less SpecAugment in the beginning.
  • Higher batch size in the beginning. Or gradient accumulation in the beginning.
  • Curriculum learning, i.e. the epoch_wise_filter option.
  • ...

Let this smallest network with highest time reduction, high batch size, less/no SpecAugment etc train like that for as long as needed, before increasing anything. This small network should first get some half-way good score. Only when you see that, at that point the pretraining can increase the depth and other things, but only carefully and slowly (such that the network does not totally break again).

from returnn-experiments.

manish-kumar-garg avatar manish-kumar-garg commented on May 24, 2024

Thanks @albertz for suggesting these.
I trained following models upto pretraining epochs (45) with following observation of loss:

  1. Base model - asr_2018_attention - below hyperparameters:
pretrain = {"repetitions": 5, "construction_algo": custom_construction_algo}
learning_rate = 0.0008
learning_rates = list(numpy.linspace(0.0003, learning_rate, num=15))  # warmup

image

  1. Uni lstm size 1024 with all hyperparameters same as 1:
    image

  2. Uni lstm size 1024 with all hyperparams same as 1 except:
    pretrain = {"repetitions": 7, "construction_algo": custom_construction_algo}
    image

  3. Uni lstm size 1024 with all hyperparams same as 1 except:
    learning_rate = 0.0005
    image

  4. Uni lstm size 1024 with all hyperparams same as 1 except warmup steps 10:
    learning_rates = list(numpy.linspace(0.0003, learning_rate, num=10)) # warmup
    image

  5. Uni lstm size 1024 with all hyperparams same as 1 except warmup steps 20:
    learning_rates = list(numpy.linspace(0.0003, learning_rate, num=20)) # warmup
    In this case loss becomes nan after 10 epochs

  6. Uni lstm size 1536 with all hyperparameters same as 1:
    image

All the models are with global attention

from returnn-experiments.

manish-kumar-garg avatar manish-kumar-garg commented on May 24, 2024

Seems like decreasing the learning rate helps.
Also, increasing the lstm cell size to 1536 helps a bit, however, not much.

What other combination do you suggest to try next?

from returnn-experiments.

albertz avatar albertz commented on May 24, 2024

All the things which I wrote already (here), but basically everything else as well.

from returnn-experiments.

manish-kumar-garg avatar manish-kumar-garg commented on May 24, 2024

Lowering the learning rate lr=0.0005 and lr_init=0.0002 worked for me.
Thanks!

from returnn-experiments.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.