katsura-jp / pytorch-cosine-annealing-with-warmup Goto Github PK

View Code? Open in Web Editor NEW

422.0 2.0 52.0 284 KB

License: MIT License

Python 100.00%

pytorch-cosine-annealing-with-warmup's Introduction

Cosine Annealing with Warmup for PyTorch

News

2020/12/22 : update is comming soon...
2020/12/24 : Merry Christmas! Release new version, 2.0. previous version is here (branch: 1.0).
2021/06/04 : this package can be installed with pip.

Installation

pip install 'git+https://github.com/katsura-jp/pytorch-cosine-annealing-with-warmup'

Args

optimizer (Optimizer): Wrapped optimizer.
first_cycle_steps (int): First cycle step size.
cycle_mult(float): Cycle steps magnification. Default: 1.
max_lr(float): First cycle's max learning rate. Default: 0.1.
min_lr(float): Min learning rate. Default: 0.001.
warmup_steps(int): Linear warmup step size. Default: 0.
gamma(float): Decrease rate of max learning rate by cycle. Default: 1.
last_epoch (int): The index of last epoch. Default: -1.

Example

>> from cosine_annealing_warmup import CosineAnnealingWarmupRestarts
>>
>> model = ...
>> optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-5) # lr is min lr
>> scheduler = CosineAnnealingWarmupRestarts(optimizer,
                                          first_cycle_steps=200,
                                          cycle_mult=1.0,
                                          max_lr=0.1,
                                          min_lr=0.001,
                                          warmup_steps=50,
                                          gamma=1.0)
>> for epoch in range(n_epoch):
>>     train()
>>     valid()
>>     scheduler.step()

case1 : CosineAnnealingWarmupRestarts(optimizer, first_cycle_steps=500, cycle_mult=1.0, max_lr=0.1, min_lr=0.001, warmup_steps=100, gamma=1.0)
case2 : CosineAnnealingWarmupRestarts(optimizer, first_cycle_steps=200, cycle_mult=1.0, max_lr=0.1, min_lr=0.001, warmup_steps=50, gamma=0.5)

pytorch-cosine-annealing-with-warmup's People

Contributors

Stargazers

Watchers

Forkers

zlheos zzilch tanyapohn donghwa-kim pgsrv wangliwen1994 dongchans yunheewoo jeonggunlee dmitrsl jinwooklim chengyawlow wu375 baldassarrefe pushparajamurugan lab176344 jinwonjoon vim-hjk napoler jhvics1 weidixie yellowjs0304 ruthvik92 pinglmlcv jungjee hengxyz jzyztzn ericabd888 valencebond whuran hongyuan-liu syskn valemore alibalapour elijahahianyo zhilun86 myth-coder morehs yin0713 soft-sensor-gsf changqingsong deniz-birlikci rockerboo cathylaucx jinwook-public slaustld srm506 techthiyanes louis-udm nakasako gedebabin donhardman

pytorch-cosine-annealing-with-warmup's Issues

Additional Features

Hi, I have some suggestions for features:

Can you add initial warmup variant so that the warmup steps only apply on the first cycle like the one shown here?
Can you infer the max_lr parameter value from the optimizer base learning rate for each group? Currently, if you have an optimizer with multiple groups with different learning rates, all of their learning rate values will get overridden by the max_lr.

BTW very awesome implementation!

License?

Hi,

Can you please update the license for this repo? It will be really helpful.

Thank you,
Best,
Shreyas

base_lr relies on the lr of optimizer

Hi, I read through your blog and it is really nice work!
I found the attribute base_lr in your class extends the one in basic scheduler, which relies on the lr of optimizer. So could you clarify that if I should set the base_lr with optimizer, and max lr with scheduler?

Is there possibility to add verbose=True, to see increase/decrease in lr as training progresses.

method name of get_le() not consistent with torch.optim optimzers

In order to track the learning rate, for example with wandb, one retrieves the last learning rate using, for instance the CosineAnnealingLR() with scheduler.get_last_lr(). The implementation here is not consistent with the torch naming.

Learning rate goes lower than specified min_lr

I found that I can make the lr go lower than the specified min_lr for small learning rates. Even though at learning rates that small it doesn't make a huge difference, it is concerning me.

Code to reproduce:

model = ...
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-5) # lr is min lr
scheduler = CosineAnnealingWarmupRestarts(optimizer,
                                          first_cycle_steps=30000,
                                          cycle_mult=1.0,
                                          max_lr=5e-4,
                                          min_lr=5e-6,
                                          warmup_steps=6000,
                                          gamma=0.5)
for iteration in range(300000):
    scheduler.step()

This will output a learning rate looking like this:

Maybe I am wrong in my understanding wether or not learning rate should be changed within an epoch or not, however I expect the lr not to fall below 5e-6.

ModuleNotFoundError: No module named 'cosine_annealing_warmup'

Hi,
I install the package with command pip install 'git+https://github.com/katsura-jp/pytorch-cosine-annealing-with-warmup'
but still failed to import it in jupyter notebook.

Allow `max_lr` to be set per group

From #11.

Issue: if you have an optimizer with multiple groups with different learning rates, all of their learning rate values will get overridden by the max_lr.

Weird gamma behavior

Hello!

CosineAnnealingWarmupRestarts(
                optimizer, first_cycle_steps=10000, warmup_steps=2000, max_lr=0.002, min_lr=1e-4, gamma=0.5)

produces LR like graph above. Is it some kind of bug?