Giter Site home page Giter Site logo

Comments (3)

AlekzNet avatar AlekzNet commented on July 22, 2024

Found a similar issue: torch/nn#707

from torch-rnn.

AlekzNet avatar AlekzNet commented on July 22, 2024

Looks like the problem is the learning rate. The more neurons per layer, the less learning rate should be. Here's an extreme example of 1x2000 net:

Learning rate = 0.03
Epoch 1.00 / 50, i = 1 / 10050, loss = 3.743563, time = 98.504015
Epoch 1.01 / 50, i = 2 / 10050, loss = 12.668491, time = 1124.716852
Epoch 1.01 / 50, i = 3 / 10050, loss = 21.633509, time = 1344.370028
Epoch 1.02 / 50, i = 4 / 10050, loss = 33.516720, time = 1350.377343
Epoch 1.02 / 50, i = 5 / 10050, loss = 37.644268, time = 784.779067
Epoch 1.03 / 50, i = 6 / 10050, loss = 44.947342, time = 575.244270
Epoch 1.03 / 50, i = 7 / 10050, loss = 49.168823, time = 518.666032
Epoch 1.04 / 50, i = 8 / 10050, loss = 60.799461, time = 518.558913
Epoch 1.04 / 50, i = 9 / 10050, loss = 74.971878, time = 526.355436
Epoch 1.05 / 50, i = 10 / 10050, loss = 88.805473, time = 486.437183
Epoch 1.05 / 50, i = 11 / 10050, loss = 102.066391, time = 457.234153
Epoch 1.06 / 50, i = 12 / 10050, loss = 110.850304, time = 407.849959
Epoch 1.06 / 50, i = 13 / 10050, loss = 113.192497, time = 350.164559
Epoch 1.07 / 50, i = 14 / 10050, loss = 118.508965, time = 313.510016
Epoch 1.07 / 50, i = 15 / 10050, loss = 125.327110, time = 293.842783
Epoch 1.08 / 50, i = 16 / 10050, loss = 127.223450, time = 279.000811
Epoch 1.08 / 50, i = 17 / 10050, loss = 126.456177, time = 261.885371
Epoch 1.09 / 50, i = 18 / 10050, loss = 125.458763, time = 250.193203
Epoch 1.09 / 50, i = 19 / 10050, loss = 113.837715, time = 241.484337
Epoch 1.10 / 50, i = 20 / 10050, loss = 116.855011, time = 235.918599

Learning rate = 0.008
Epoch 1.00 / 50, i = 1 / 10050, loss = 3.732060, time = 96.875242
Epoch 1.01 / 50, i = 2 / 10050, loss = 6.206121, time = 95.081780
Epoch 1.01 / 50, i = 3 / 10050, loss = 6.808724, time = 94.878459
Epoch 1.02 / 50, i = 4 / 10050, loss = 7.313009, time = 95.267581
Epoch 1.02 / 50, i = 5 / 10050, loss = 6.815035, time = 94.705649
Epoch 1.03 / 50, i = 6 / 10050, loss = 7.193333, time = 95.231878
Epoch 1.03 / 50, i = 7 / 10050, loss = 6.133412, time = 94.604229
Epoch 1.04 / 50, i = 8 / 10050, loss = 5.197161, time = 94.892955
Epoch 1.04 / 50, i = 9 / 10050, loss = 4.643122, time = 94.759841
Epoch 1.05 / 50, i = 10 / 10050, loss = 4.195079, time = 95.131519
Epoch 1.05 / 50, i = 11 / 10050, loss = 3.916904, time = 94.101075
Epoch 1.06 / 50, i = 12 / 10050, loss = 3.568570, time = 94.284384
Epoch 1.06 / 50, i = 13 / 10050, loss = 3.203988, time = 94.155637
Epoch 1.07 / 50, i = 14 / 10050, loss = 3.318531, time = 94.185162
Epoch 1.07 / 50, i = 15 / 10050, loss = 3.553847, time = 94.575277
Epoch 1.08 / 50, i = 16 / 10050, loss = 4.324246, time = 97.595417
Epoch 1.08 / 50, i = 17 / 10050, loss = nan, time = 72.681167
Epoch 1.09 / 50, i = 18 / 10050, loss = nan, time = 73.568664
Epoch 1.09 / 50, i = 19 / 10050, loss = nan, time = 72.930155
Epoch 1.10 / 50, i = 20 / 10050, loss = nan, time = 73.618026

Learning rate = 0.004
Epoch 1.00 / 50, i = 1 / 10050, loss = 3.744624, time = 95.754618
Epoch 1.01 / 50, i = 2 / 10050, loss = 4.204688, time = 98.018919
Epoch 1.01 / 50, i = 3 / 10050, loss = 3.852427, time = 98.501101
Epoch 1.02 / 50, i = 4 / 10050, loss = 4.428941, time = 98.728853
Epoch 1.02 / 50, i = 5 / 10050, loss = 3.003022, time = 99.991098
Epoch 1.03 / 50, i = 6 / 10050, loss = 3.446127, time = 101.905007
Epoch 1.03 / 50, i = 7 / 10050, loss = 3.367308, time = 99.737719
Epoch 1.04 / 50, i = 8 / 10050, loss = 3.028597, time = 100.117175
Epoch 1.04 / 50, i = 9 / 10050, loss = 3.011692, time = 98.692176
Epoch 1.05 / 50, i = 10 / 10050, loss = 2.939172, time = 97.518067
. . .
Epoch 1.15 / 50, i = 31 / 10050, loss = 2.391058, time = 97.899679
Epoch 1.16 / 50, i = 32 / 10050, loss = 2.381518, time = 97.903584
Epoch 1.16 / 50, i = 33 / 10050, loss = 2.368582, time = 97.843571
Epoch 1.17 / 50, i = 34 / 10050, loss = 2.418341, time = 97.629543
Epoch 1.17 / 50, i = 35 / 10050, loss = 2.338644, time = 97.844184
. . .

from torch-rnn.

AlekzNet avatar AlekzNet commented on July 22, 2024

The first model (3x1024) with learning rate=0.005 (0.008 is too big):

Epoch 1.01 / 50, i = 2 / 10050, loss = 3.907162, time = 163.995822
Epoch 1.01 / 50, i = 3 / 10050, loss = 3.715210, time = 164.403803
Epoch 1.02 / 50, i = 4 / 10050, loss = 3.570075, time = 163.936679
Epoch 1.02 / 50, i = 5 / 10050, loss = 3.473840, time = 164.524207
Epoch 1.03 / 50, i = 6 / 10050, loss = 3.154179, time = 164.751064
Epoch 1.03 / 50, i = 7 / 10050, loss = 2.997317, time = 164.154949
Epoch 1.04 / 50, i = 8 / 10050, loss = 2.845170, time = 164.053583
Epoch 1.04 / 50, i = 9 / 10050, loss = 2.657294, time = 163.970862
Epoch 1.05 / 50, i = 10 / 10050, loss = 2.524375, time = 163.097223
Epoch 1.05 / 50, i = 11 / 10050, loss = 2.442510, time = 167.253639
Epoch 1.06 / 50, i = 12 / 10050, loss = 2.365662, time = 164.417616
Epoch 1.06 / 50, i = 13 / 10050, loss = 2.286323, time = 164.347997
Epoch 1.07 / 50, i = 14 / 10050, loss = 2.241984, time = 163.535754
Epoch 1.07 / 50, i = 15 / 10050, loss = 2.195409, time = 163.264860

Looks MUCH better!!! ;)

from torch-rnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.