Giter Site home page Giter Site logo

adabound's Issues

Can this deal with complex numbers?

Hi authors,

I intended to use this method on complex numbers and it turned out with a error message like:

File "optimizer.py", line 701, in step step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_( RuntimeError: "clamp_scalar_cpu" not implemented for 'ComplexFloat'

I'm wondering if it's possible to improve this for complex numbers? Thanks.

Ni

When did the optimizer switch to SGD?

I set the initial lr=0.0001, final_lr=0.1,
but I still don't know when the optimizer will become SGD.
Do I need to improve my learning rate to the final learning rate manually?
thanks!

Learning rate changing

Hi, thanks a lot for sharing your excellent work.

I wonder if I want to change learning rate with epoch increasing, how do I set parameter lr
and final_lr in adamnboound ? Or is there any need changing learining rate with epoch increasing?

Looking for your reply, thanks a lot.

Why python 3.6 requirement?

I don't see any reason why this code would not run in a lower version of python.
Could you explain why is there such a requirement?

AttributeError: no attribute 'base_lrs'

Thank you very much for sharing this impressive work. I am somehow receiving the following error:

    for group, base_lr in zip(self.param_groups, self.base_lrs):
AttributeError: 'AdaBound' object has no attribute 'base_lrs'

Don't work properly with higher lr

I'm new in deep learning and I found the project works well with SGD but turns to be sth wrong with adabound.

When I start with lr=1e-3, it shows as below and break down:
invalid argument 2: non-empty 3D or 4D (batch mode) tensor expected for input, but got: [1 x 64 x 0 x 27] at /pytorch/aten/src/THCUNN/generic/SpatialAdaptiveMaxPooling.cu:24

But seems to work right if I set lr to 1e-4 or lower. It confused me a lot.
Any ideas?

python=3.6
pytorch=1.0.1 / 0.4

update pip package please~

Thx for your nice job, but I find that the package doesn't include AdaBoundW, Could you please update it?

Be careful when using adaptive gradient methods

camp

I tested three methods in a very simple problem, and got the result as above.

Code are printed here:

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import adabound

class Net(nn.Module):

def __init__(self, dim):
    
    super(Net, self).__init__()
    self.fc1 = nn.Linear(dim, 2*dim)
    self.relu = nn.ReLU(inplace=True)
    self.fc2 = nn.Linear(2*dim, dim)

def forward(self, x):
    
    x = self.fc1(x)
    x = self.relu(x)
    x = self.fc2(x)
    
    return x

DIM = 30
epochs = 1000
xini = (torch.ones(1, DIM) * 100)
opti = (torch.zeros(1, DIM) * 100)

lr = 0.01
net = Net(DIM)
objfun = nn.MSELoss()

loss_adab = []
loss_adam = []
loss_sgd = []
for epoch in range(epochs):

if epoch % 100 == 0:
    lr /= 10

optimizer = adabound.AdaBound(net.parameters(), lr) 
out = net(xini)
los = objfun(out, opti)
loss_adab.append(los.detach().numpy())

optimizer.zero_grad()
los.backward()
optimizer.step()

lr = 0.01
net = Net(DIM)
objfun = nn.MSELoss()

for epoch in range(epochs):

if epoch % 100 == 0:
    lr /= 10

optimizer = torch.optim.Adam(net.parameters(), lr) 
out = net(xini)
los = objfun(out, opti)
loss_adam.append(los.detach().numpy())

optimizer.zero_grad()
los.backward()
optimizer.step()   

lr = 0.001
net = Net(DIM)
objfun = nn.MSELoss()

for epoch in range(epochs):

if epoch % 100 == 0:
    lr /= 10

optimizer = torch.optim.SGD(net.parameters(), lr, momentum=0.9) 
out = net(xini)
los = objfun(out, opti)
loss_sgd.append(los.detach().numpy())

optimizer.zero_grad()
los.backward()
optimizer.step()

plt.figure()
plt.plot(loss_adab, label='adabound')
plt.plot(loss_adam, label='adam')
plt.plot(loss_sgd, label='SGD')
plt.yscale('log')
plt.xlabel('epochs')
plt.ylabel('Log(loss)')
plt.legend()
plt.savefig('camp.png', dpi=600)
plt.show()

lr_scheduler affect the actual learning rate

# Applies bounds on actual learning rate
# lr_scheduler cannot affect final_lr, this is a workaround to apply lr decay
final_lr = group['final_lr'] * group['lr'] / base_lr`

However lr_scheduler may change param_group['lr'] during training, therefore the final_lr, lower_bound, upper_bound will also be affected.

Should I not use lr_scheduler and let AbaBound adapts the params to transform from Adam to SGD?

Thank you very much!

LSTM hyparameters for language modeling

Greetings,

Thanks for your great paper. I am wondering about the hyperparameters you used for language modeling experiments. Could you provide information about that?

Thank you!

Pytorch 1.6 warning

/home/xxxx/.local/lib/python3.7/site-packages/adabound/adabound.py:94: UserWarning: This overload of add_ is deprecated:
        add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
        add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
  exp_avg.mul_(beta1).add_(1 - beta1, grad)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.