Giter Site home page Giter Site logo

EXCEPTION NetworkConstructionDependencyLoopException: Error: There is a dependency loop on layer 'accum_att_weights'. about returnn-experiments HOT 8 CLOSED

rwth-i6 avatar rwth-i6 commented on May 24, 2024
EXCEPTION NetworkConstructionDependencyLoopException: Error: There is a dependency loop on layer 'accum_att_weights'.

from returnn-experiments.

Comments (8)

albertz avatar albertz commented on May 24, 2024

Unfortunately the configs are slightly buggy, and do not work correctly with a more recent RETURNN version.
It should be easy to fix, though. Change this:

"p_t_in": {"class": "eval", "from": "prev:att_weights", "eval": "tf.squeeze(tf.argmax(source(0), axis=1, output_type=tf.int32), axis=1)",
  "out_type": {"shape": (), "batch_dim_axis": 0, "dtype": "float32"}},

To:

"p_t_in": {"class": "reduce", "from": "prev:att_weights", "mode": "argmax", "axis": "t"},

Also make sure that you use the latest RETURNN version.

from returnn-experiments.

manish-kumar-garg avatar manish-kumar-garg commented on May 24, 2024

After making this change and using latest RETURNN version.
Getting incorrect shapes

TensorFlow exception: Incompatible shapes: [14,1,45] vs. [45,1,45]
	 [[node output/rec/att_weights/LogicalAnd_1 (defined at /home/ubuntu/rwth-i6/returnn-experiments/2018-asr-attention/librispeech/full-setup-attention/returnn/TFNetworkLayer.py:3159) ]]

stdout

from returnn-experiments.

albertz avatar albertz commented on May 24, 2024

Thank you. There was actually a real bug, which I fixed now (commit 7999f7430cb968).
The test test_rec_layer_local_att_train_and_search should cover this now.
Can you try again with latest Returnn?

from returnn-experiments.

manish-kumar-garg avatar manish-kumar-garg commented on May 24, 2024

It works now. Thanks!

from returnn-experiments.

manish-kumar-garg avatar manish-kumar-garg commented on May 24, 2024

@albertz ,
After making this change,
I can see that the model is not converging as expected.

Logs here

Loss is always around 6k-7k

from returnn-experiments.

albertz avatar albertz commented on May 24, 2024

We did three kinds of experiments in the paper:

  • Training with global soft attention, and then just importing it into this local soft attention config.
  • Training with global soft attention, then importing it as local soft attention, and further training a bit.
  • Training with local soft attention from scratch.

@Spotlight0xff are these the configs you used for importing the model, or from scratch training? If only for importing, can you also add the configs for the from-scratch training? (Or maybe just one reasonable/representative config for that.)

I remember that the from scratch training was quite unstable and needed some more tuning.
The importing (and optionally continue training a bit) should work in any case. Did you try that?

from returnn-experiments.

manish-kumar-garg avatar manish-kumar-garg commented on May 24, 2024

I have not tried that yet. Will try now.

from returnn-experiments.

Spotlight0xff avatar Spotlight0xff commented on May 24, 2024

Hi,

The configs in librispeech/ were all for pretrained (global attention) models (so case 1 and 2, but not 3.).
I just added a config for from-scratch training here.
The issue with your model was probably the too small learning rate which was used when we retrained the initialized models.

from returnn-experiments.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.