Giter Site home page Giter Site logo

pytorch-cosine-annealing-with-warmup's Introduction

Cosine Annealing with Warmup for PyTorch

News

  • 2020/12/22 : update is comming soon...
  • 2020/12/24 : Merry Christmas! Release new version, 2.0. previous version is here (branch: 1.0).
  • 2021/06/04 : this package can be installed with pip.

Installation

pip install 'git+https://github.com/katsura-jp/pytorch-cosine-annealing-with-warmup'

Args

  • optimizer (Optimizer): Wrapped optimizer.
  • first_cycle_steps (int): First cycle step size.
  • cycle_mult(float): Cycle steps magnification. Default: 1.
  • max_lr(float): First cycle's max learning rate. Default: 0.1.
  • min_lr(float): Min learning rate. Default: 0.001.
  • warmup_steps(int): Linear warmup step size. Default: 0.
  • gamma(float): Decrease rate of max learning rate by cycle. Default: 1.
  • last_epoch (int): The index of last epoch. Default: -1.

Example

>> from cosine_annealing_warmup import CosineAnnealingWarmupRestarts
>>
>> model = ...
>> optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-5) # lr is min lr
>> scheduler = CosineAnnealingWarmupRestarts(optimizer,
                                          first_cycle_steps=200,
                                          cycle_mult=1.0,
                                          max_lr=0.1,
                                          min_lr=0.001,
                                          warmup_steps=50,
                                          gamma=1.0)
>> for epoch in range(n_epoch):
>>     train()
>>     valid()
>>     scheduler.step()
  • case1 : CosineAnnealingWarmupRestarts(optimizer, first_cycle_steps=500, cycle_mult=1.0, max_lr=0.1, min_lr=0.001, warmup_steps=100, gamma=1.0) example1
  • case2 : CosineAnnealingWarmupRestarts(optimizer, first_cycle_steps=200, cycle_mult=1.0, max_lr=0.1, min_lr=0.001, warmup_steps=50, gamma=0.5) example2

pytorch-cosine-annealing-with-warmup's People

Contributors

baldassarrefe avatar katsura-jp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pytorch-cosine-annealing-with-warmup's Issues

Additional Features

Hi, I have some suggestions for features:

  • Can you add initial warmup variant so that the warmup steps only apply on the first cycle like the one shown here?
  • Can you infer the max_lr parameter value from the optimizer base learning rate for each group? Currently, if you have an optimizer with multiple groups with different learning rates, all of their learning rate values will get overridden by the max_lr.

BTW very awesome implementation!

License?

Hi,

Can you please update the license for this repo? It will be really helpful.

Thank you,
Best,
Shreyas

base_lr relies on the lr of optimizer

Hi, I read through your blog and it is really nice work!
I found the attribute base_lr in your class extends the one in basic scheduler, which relies on the lr of optimizer. So could you clarify that if I should set the base_lr with optimizer, and max lr with scheduler?

Learning rate goes lower than specified min_lr

I found that I can make the lr go lower than the specified min_lr for small learning rates. Even though at learning rates that small it doesn't make a huge difference, it is concerning me.

Code to reproduce:

model = ...
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-5) # lr is min lr
scheduler = CosineAnnealingWarmupRestarts(optimizer,
                                          first_cycle_steps=30000,
                                          cycle_mult=1.0,
                                          max_lr=5e-4,
                                          min_lr=5e-6,
                                          warmup_steps=6000,
                                          gamma=0.5)
for iteration in range(300000):
    scheduler.step()

This will output a learning rate looking like this:
image

Maybe I am wrong in my understanding wether or not learning rate should be changed within an epoch or not, however I expect the lr not to fall below 5e-6.

Allow `max_lr` to be set per group

From #11.

Issue: if you have an optimizer with multiple groups with different learning rates, all of their learning rate values will get overridden by the max_lr.

Weird gamma behavior

Hello!

image

CosineAnnealingWarmupRestarts(
                optimizer, first_cycle_steps=10000, warmup_steps=2000, max_lr=0.002, min_lr=1e-4, gamma=0.5)

produces LR like graph above. Is it some kind of bug?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.