luolc / adabound Goto Github PK
View Code? Open in Web Editor NEWAn optimizer that trains as fast as Adam and as good as SGD.
Home Page: https://www.luolc.com/publications/adabound/
License: Apache License 2.0
An optimizer that trains as fast as Adam and as good as SGD.
Home Page: https://www.luolc.com/publications/adabound/
License: Apache License 2.0
Hi authors,
I intended to use this method on complex numbers and it turned out with a error message like:
File "optimizer.py", line 701, in step step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_( RuntimeError: "clamp_scalar_cpu" not implemented for 'ComplexFloat'
I'm wondering if it's possible to improve this for complex numbers? Thanks.
Ni
https://github.com/wayne391/Image-Super-Resolution/blob/master/src/models/RCAN.py
Just change
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, amsgrad=False)
to
optimizer = adabound.AdaBound(model.parameters(), lr=1e-4, final_lr=0.1)
Nan loss in RCAN model, but Adam work fine.
I set the initial lr=0.0001, final_lr=0.1,
but I still don't know when the optimizer will become SGD.
Do I need to improve my learning rate to the final learning rate manually?
thanks!
I'm wondering what is happening at epoch 150 in all visualizations? I would like to introduce that into all my models ;-)
https://github.com/Luolc/AdaBound/blob/master/demos/cifar10/visualization.ipynb
Hi, thanks a lot for sharing your excellent work.
I wonder if I want to change learning rate with epoch increasing, how do I set parameter lr
and final_lr in adamnboound ? Or is there any need changing learining rate with epoch increasing?
Looking for your reply, thanks a lot.
I don't see any reason why this code would not run in a lower version of python.
Could you explain why is there such a requirement?
Hello, can you please tell me what these two parameters in α / √Vt mean, especially Vt?
Thank you
IIRC, because group['lr']
will never be changed, so finalr_lr
will always be the same as group['final_lr']
.
Is this intended?
Line 110 in 6fa8260
具体效果我会在这里提交一份,如果我没忘记的话。
Thank you very much for sharing this impressive work. I am somehow receiving the following error:
for group, base_lr in zip(self.param_groups, self.base_lrs):
AttributeError: 'AdaBound' object has no attribute 'base_lrs'
I'm new in deep learning and I found the project works well with SGD but turns to be sth wrong with adabound.
When I start with lr=1e-3, it shows as below and break down:
invalid argument 2: non-empty 3D or 4D (batch mode) tensor expected for input, but got: [1 x 64 x 0 x 27] at /pytorch/aten/src/THCUNN/generic/SpatialAdaptiveMaxPooling.cu:24
But seems to work right if I set lr to 1e-4 or lower. It confused me a lot.
Any ideas?
python=3.6
pytorch=1.0.1 / 0.4
Thx for your nice job, but I find that the package doesn't include AdaBoundW, Could you please update it?
https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
I strongly believe that AdaBound would be better if it used RAdam instead of Adam.
It could merge with Lookahead too and LAMB.
Then we would have the best of both worlds and a beautiful example of scientific collaboration.
Can't wait to try with Tensorflow. I'm just curious about the release date of Tensorflow version.
I tested three methods in a very simple problem, and got the result as above.
Code are printed here:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import adabound
class Net(nn.Module):
def __init__(self, dim):
super(Net, self).__init__()
self.fc1 = nn.Linear(dim, 2*dim)
self.relu = nn.ReLU(inplace=True)
self.fc2 = nn.Linear(2*dim, dim)
def forward(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
DIM = 30
epochs = 1000
xini = (torch.ones(1, DIM) * 100)
opti = (torch.zeros(1, DIM) * 100)
lr = 0.01
net = Net(DIM)
objfun = nn.MSELoss()
loss_adab = []
loss_adam = []
loss_sgd = []
for epoch in range(epochs):
if epoch % 100 == 0:
lr /= 10
optimizer = adabound.AdaBound(net.parameters(), lr)
out = net(xini)
los = objfun(out, opti)
loss_adab.append(los.detach().numpy())
optimizer.zero_grad()
los.backward()
optimizer.step()
lr = 0.01
net = Net(DIM)
objfun = nn.MSELoss()
for epoch in range(epochs):
if epoch % 100 == 0:
lr /= 10
optimizer = torch.optim.Adam(net.parameters(), lr)
out = net(xini)
los = objfun(out, opti)
loss_adam.append(los.detach().numpy())
optimizer.zero_grad()
los.backward()
optimizer.step()
lr = 0.001
net = Net(DIM)
objfun = nn.MSELoss()
for epoch in range(epochs):
if epoch % 100 == 0:
lr /= 10
optimizer = torch.optim.SGD(net.parameters(), lr, momentum=0.9)
out = net(xini)
los = objfun(out, opti)
loss_sgd.append(los.detach().numpy())
optimizer.zero_grad()
los.backward()
optimizer.step()
plt.figure()
plt.plot(loss_adab, label='adabound')
plt.plot(loss_adam, label='adam')
plt.plot(loss_sgd, label='SGD')
plt.yscale('log')
plt.xlabel('epochs')
plt.ylabel('Log(loss)')
plt.legend()
plt.savefig('camp.png', dpi=600)
plt.show()
# Applies bounds on actual learning rate
# lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
final_lr = group['final_lr'] * group['lr'] / base_lr`
However lr_scheduler may change param_group['lr'] during training, therefore the final_lr, lower_bound, upper_bound will also be affected.
Should I not use lr_scheduler and let AbaBound adapts the params to transform from Adam to SGD?
Thank you very much!
could you please implement a keras version
Greetings,
Thanks for your great paper. I am wondering about the hyperparameters you used for language modeling experiments. Could you provide information about that?
Thank you!
correct grammar would be "as well as adam"
not sure if you care
/home/xxxx/.local/lib/python3.7/site-packages/adabound/adabound.py:94: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
exp_avg.mul_(beta1).add_(1 - beta1, grad)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.