Comments (3)
Found a similar issue: torch/nn#707
from torch-rnn.
Looks like the problem is the learning rate. The more neurons per layer, the less learning rate should be. Here's an extreme example of 1x2000 net:
Learning rate = 0.03
Epoch 1.00 / 50, i = 1 / 10050, loss = 3.743563, time = 98.504015
Epoch 1.01 / 50, i = 2 / 10050, loss = 12.668491, time = 1124.716852
Epoch 1.01 / 50, i = 3 / 10050, loss = 21.633509, time = 1344.370028
Epoch 1.02 / 50, i = 4 / 10050, loss = 33.516720, time = 1350.377343
Epoch 1.02 / 50, i = 5 / 10050, loss = 37.644268, time = 784.779067
Epoch 1.03 / 50, i = 6 / 10050, loss = 44.947342, time = 575.244270
Epoch 1.03 / 50, i = 7 / 10050, loss = 49.168823, time = 518.666032
Epoch 1.04 / 50, i = 8 / 10050, loss = 60.799461, time = 518.558913
Epoch 1.04 / 50, i = 9 / 10050, loss = 74.971878, time = 526.355436
Epoch 1.05 / 50, i = 10 / 10050, loss = 88.805473, time = 486.437183
Epoch 1.05 / 50, i = 11 / 10050, loss = 102.066391, time = 457.234153
Epoch 1.06 / 50, i = 12 / 10050, loss = 110.850304, time = 407.849959
Epoch 1.06 / 50, i = 13 / 10050, loss = 113.192497, time = 350.164559
Epoch 1.07 / 50, i = 14 / 10050, loss = 118.508965, time = 313.510016
Epoch 1.07 / 50, i = 15 / 10050, loss = 125.327110, time = 293.842783
Epoch 1.08 / 50, i = 16 / 10050, loss = 127.223450, time = 279.000811
Epoch 1.08 / 50, i = 17 / 10050, loss = 126.456177, time = 261.885371
Epoch 1.09 / 50, i = 18 / 10050, loss = 125.458763, time = 250.193203
Epoch 1.09 / 50, i = 19 / 10050, loss = 113.837715, time = 241.484337
Epoch 1.10 / 50, i = 20 / 10050, loss = 116.855011, time = 235.918599
Learning rate = 0.008
Epoch 1.00 / 50, i = 1 / 10050, loss = 3.732060, time = 96.875242
Epoch 1.01 / 50, i = 2 / 10050, loss = 6.206121, time = 95.081780
Epoch 1.01 / 50, i = 3 / 10050, loss = 6.808724, time = 94.878459
Epoch 1.02 / 50, i = 4 / 10050, loss = 7.313009, time = 95.267581
Epoch 1.02 / 50, i = 5 / 10050, loss = 6.815035, time = 94.705649
Epoch 1.03 / 50, i = 6 / 10050, loss = 7.193333, time = 95.231878
Epoch 1.03 / 50, i = 7 / 10050, loss = 6.133412, time = 94.604229
Epoch 1.04 / 50, i = 8 / 10050, loss = 5.197161, time = 94.892955
Epoch 1.04 / 50, i = 9 / 10050, loss = 4.643122, time = 94.759841
Epoch 1.05 / 50, i = 10 / 10050, loss = 4.195079, time = 95.131519
Epoch 1.05 / 50, i = 11 / 10050, loss = 3.916904, time = 94.101075
Epoch 1.06 / 50, i = 12 / 10050, loss = 3.568570, time = 94.284384
Epoch 1.06 / 50, i = 13 / 10050, loss = 3.203988, time = 94.155637
Epoch 1.07 / 50, i = 14 / 10050, loss = 3.318531, time = 94.185162
Epoch 1.07 / 50, i = 15 / 10050, loss = 3.553847, time = 94.575277
Epoch 1.08 / 50, i = 16 / 10050, loss = 4.324246, time = 97.595417
Epoch 1.08 / 50, i = 17 / 10050, loss = nan, time = 72.681167
Epoch 1.09 / 50, i = 18 / 10050, loss = nan, time = 73.568664
Epoch 1.09 / 50, i = 19 / 10050, loss = nan, time = 72.930155
Epoch 1.10 / 50, i = 20 / 10050, loss = nan, time = 73.618026
Learning rate = 0.004
Epoch 1.00 / 50, i = 1 / 10050, loss = 3.744624, time = 95.754618
Epoch 1.01 / 50, i = 2 / 10050, loss = 4.204688, time = 98.018919
Epoch 1.01 / 50, i = 3 / 10050, loss = 3.852427, time = 98.501101
Epoch 1.02 / 50, i = 4 / 10050, loss = 4.428941, time = 98.728853
Epoch 1.02 / 50, i = 5 / 10050, loss = 3.003022, time = 99.991098
Epoch 1.03 / 50, i = 6 / 10050, loss = 3.446127, time = 101.905007
Epoch 1.03 / 50, i = 7 / 10050, loss = 3.367308, time = 99.737719
Epoch 1.04 / 50, i = 8 / 10050, loss = 3.028597, time = 100.117175
Epoch 1.04 / 50, i = 9 / 10050, loss = 3.011692, time = 98.692176
Epoch 1.05 / 50, i = 10 / 10050, loss = 2.939172, time = 97.518067
. . .
Epoch 1.15 / 50, i = 31 / 10050, loss = 2.391058, time = 97.899679
Epoch 1.16 / 50, i = 32 / 10050, loss = 2.381518, time = 97.903584
Epoch 1.16 / 50, i = 33 / 10050, loss = 2.368582, time = 97.843571
Epoch 1.17 / 50, i = 34 / 10050, loss = 2.418341, time = 97.629543
Epoch 1.17 / 50, i = 35 / 10050, loss = 2.338644, time = 97.844184
. . .
from torch-rnn.
The first model (3x1024) with learning rate=0.005 (0.008 is too big):
Epoch 1.01 / 50, i = 2 / 10050, loss = 3.907162, time = 163.995822
Epoch 1.01 / 50, i = 3 / 10050, loss = 3.715210, time = 164.403803
Epoch 1.02 / 50, i = 4 / 10050, loss = 3.570075, time = 163.936679
Epoch 1.02 / 50, i = 5 / 10050, loss = 3.473840, time = 164.524207
Epoch 1.03 / 50, i = 6 / 10050, loss = 3.154179, time = 164.751064
Epoch 1.03 / 50, i = 7 / 10050, loss = 2.997317, time = 164.154949
Epoch 1.04 / 50, i = 8 / 10050, loss = 2.845170, time = 164.053583
Epoch 1.04 / 50, i = 9 / 10050, loss = 2.657294, time = 163.970862
Epoch 1.05 / 50, i = 10 / 10050, loss = 2.524375, time = 163.097223
Epoch 1.05 / 50, i = 11 / 10050, loss = 2.442510, time = 167.253639
Epoch 1.06 / 50, i = 12 / 10050, loss = 2.365662, time = 164.417616
Epoch 1.06 / 50, i = 13 / 10050, loss = 2.286323, time = 164.347997
Epoch 1.07 / 50, i = 14 / 10050, loss = 2.241984, time = 163.535754
Epoch 1.07 / 50, i = 15 / 10050, loss = 2.195409, time = 163.264860
Looks MUCH better!!! ;)
from torch-rnn.
Related Issues (20)
- Arch Linux multilib install error HOT 1
- Error training HOT 1
- HDF5 No accessibility and not valid? HOT 2
- Make the model "forget" or modify vocabulary it has been trained on? HOT 2
- lua:56: expected align(#) on line 579
- Deterministic output of sample.lua with -start_test flag HOT 6
- Convert to classification
- Cannot serialise number: must not be NaN or Infinity HOT 1
- sample.lua doesn't support bytes output
- Should I call model:clearState() before/after saving/loading a RNN model?
- ./util/utils.lua:43: attempt to index local 'f' (a nil value) HOT 1
- Loss increases gradually HOT 1
- Implementation of Tensorboard?
- Use of the generated json during checkpoint?
- how to set learn rate?
- how to optimize
- Hi. Is the code available for python based libraries? Such as Keras, and PyTorch HOT 3
- Sampling output file has Chinese characters in it
- Learning this as I go, can't sort out this error (init.lua, unable to find HDF5 lib) HOT 3
- Training error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torch-rnn.