Comments (8)
Momentum needs to keep track of the accumulated previous gradients. That is what the mparam_i
variables are for. They need to be updated at each iteration as well, hence the addition of the tuple (mparam_i, v)
to the updates list.
The mparam_i
tensors are initialized to zero, but during training they will be updated with the current accumulated gradient.
You could indeed scale the second term in v with (1 - momentum). This would allow you to experiment with different values of momentum and different learning rates more easily, in fact. But in literature and in practice, most people currently do not do this, so we opted not to do it either.
from lasagne.
Ok. I'm new in the theano graph world :)
mparam_i = theano.shared(np.zeros(param_i.get_value().shape, dtype=theano.config.floatX))
initialize a shared variable at zero which is later updated?
from lasagne.
Exactly :)
from lasagne.
You could indeed scale the second term in v with (1 - momentum). This would allow you to experiment with different values of momentum and different learning rates more easily, in fact. But in literature and in practice, most people currently do not do this, so we opted not to do it either.
It would be nice to have this as an option, though. Some people do use the less traditional formula. I'll make an issue for that so we don't forget.
from lasagne.
We should also consider if it's worth complicating the API for this. We had a very similar discussion in #10 :)
from lasagne.
We should also consider if it's worth complicating the API for this.
Hmm, you're right. It's enough if the user just scales the learning rate by (1-momentum), isn't it?
/edit: By the way, what is the weight decay doing in there? Shouldn't that just be part of the cost and then become part of the gradient automatically?
from lasagne.
By the way, what is the weight decay doing in there? Shouldn't that just be part of the cost and then become part of the gradient automatically?
Yep. I copied over most of this from my galaxy challenge code, it's probably a remnant of that. Having it as part of the cost is much cleaner.
from lasagne.
made a new issue for the weight decay thing, closing this one.
from lasagne.
Related Issues (20)
- ThinPlateSpline is bugged, proposed fix HOT 2
- Update Lasagne installation doc to new gpuarray backend HOT 3
- AttributeError: 'Conv2DLayer' object has no attribute 'flip_filters' HOT 1
- Theano discontinuation HOT 5
- Where is the GlobalMaxPool2D?? HOT 1
- How to save layer l_out as lasagne layer to network in json or h5 format to be imported from Matlab HOT 1
- how to get the exact value of the tensor variable and its type. HOT 2
- The tremendous different time consuming on mnist between cnn and mlp architecture. HOT 6
- How to put constraint on the weights in each layer. HOT 1
- How to put the constraint on parameters
- AttributeError: 'Conv2DLayer' object has no attribute 'num_groups' HOT 2
- Why the `bcast` is needed in `create_param()`? HOT 2
- rules in setting weights in the combination of conv2d layer and batch norm layer HOT 1
- updates.py HOT 2
- Hi! There are some problems about creating a new layer! HOT 1
- lasagne\layers\base.py HOT 1
- LocallyConnected2DLayer params not initialized correctly HOT 1
- Center Loss as an Objective Function?
- Error with mock in Python 3.8.3 and 3.9 HOT 3
- lasagne isn't running on CUDA (Windows 10) .theanorc setup HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lasagne.