Giter Site home page Giter Site logo

clr's Introduction

Cyclical Learning Rate (CLR)

Alt text

This repository includes a Keras callback to be used in training that allows implementation of cyclical learning rate policies, as detailed in Leslie Smith's paper Cyclical Learning Rates for Training Neural Networks arXiv:1506.01186v4.

A cyclical learning rate is a policy of learning rate adjustment that increases the learning rate off a base value in a cyclical nature. Typically the frequency of the cycle is constant, but the amplitude is often scaled dynamically at either each cycle or each mini-batch iteration.

Why CLR

The author demonstrates how CLR policies can provide quicker converge for some neural network tasks and architectures. One example from the paper compares validation accuracy for classification on the CIFAR-10 dataset. In this specific example, the author used a triangular2 clr policy (detailed below). With clr, their model reached 81.4% validation accuracy in only 25,000 iterations compared to 70,000 iterations with standard hyperparameter settings.

One reason this approach may work well is because increasing the learning rate is an effective way of escaping saddle points. By cycling the learning rate, we're guaranteeing that such an increase will take place if we end up in a saddle point.

CyclicLR()

The purpose of this class is to not only provide an easy implementation of CLR for Keras, but to enable easy experimentation with policies not explored in the original paper.

clr_callback.py contains the callback class CyclicLR().

This class includes 3 built-in CLR policies, 'triangular', 'triangular2', and 'exp_range', as detailed in the original paper. It also allows for custom amplitude scaling functions, enabling easy experimentation.

Arguments for this class include:

  • base_lr: initial learning rate, which is the lower boundary in the cycle. This overrides optimizer lr. Default 0.001.
  • max_lr: upper boundary in the cycle. Functionally, it defines the cycle amplitude (max_lr - base_lr). The lr at any cycle is the sum of base_lr and some scaling of the amplitude; therefore max_lr may not actually be reached depending on scaling function. Default 0.006.
  • step_size: number of training iterations per half cycle. Authors suggest setting step_size = (2-8) x (training iterations in epoch). Default 2000.
  • mode: one of {'triangular', 'triangular2', 'exp_range'}. Values correspond to policies detailed below. If scale_fn is not None, this argument is ignored. Default 'triangular'.
  • gamma: constant in 'exp_range' scaling function, gamma^(cycle iterations). Default 1.
  • scale_fn: Custom scaling policy defined by a single argument lambda function, where 0 <= scale_fn(x) <= 1 for all x >= 0. mode parameter is ignored when this argument is used. Default None.
  • scale_mode: {'cycle', 'iterations'}. Defines whether scale_fn is evaluated on cycle number or cycle iterations (training iterations since start of cycle). Default is 'cycle'.

NOTE: base_lr overrides optimizer.lr

The general structure of the policy algorithm is:

cycle = np.floor(1+iterations/(2*step_size))
x = np.abs(iterations/step_size - 2*cycle + 1)
lr= base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))*scale_fn(x)

where x is either iterations or cycle, depending on scale_mode.

CyclicLR() can be used with any optimizer in Keras.

Syncing cycle and training iterations

The author points out that the best accuracies are typically attained by ending with the base learning rate. Therefore it's recommended to make sure your training finishes at the end of the cycle.

Policies

triangular

Alt text

This method is a simple triangular cycle.

Basic algorithm:

cycle = np.floor(1+iterations/(2*step_size))
x = np.abs(iterations/step_size - 2*cycle + 1)
lr = base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))

Default triangular clr policy example:

    clr = CyclicLR(base_lr=0.001, max_lr=0.006,
                        step_size=2000.)
    model.fit(X_train, Y_train, callbacks=[clr])

Results:

Alt text

triangular2

Alt text

This method is a triangular cycle that decreases the cycle amplitude by half after each period, while keeping the base lr constant. This is an example of scaling on cycle number.

Basic algorithm:

cycle = np.floor(1+iterations/(2*step_size))
x = np.abs(iterations/step_size - 2*cycle + 1)
lr = base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))/float(2**(cycle-1))

Default triangular clr policy example:

    clr = CyclicLR(base_lr=0.001, max_lr=0.006,
                        step_size=2000., mode='triangular2')
    model.fit(X_train, Y_train, callbacks=[clr])

Results:

Alt text

exp_range

Alt text

This method is a triangular cycle that scales the cycle amplitude by a factor gamma**(iterations), while keeping the base lr constant. This is an example of scaling on iteration.

Basic algorithm:

cycle = np.floor(1+iterations/(2*step_size))
x = np.abs(iterations/step_size - 2*cycle + 1)
lr= base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))*gamma**(iterations)

Default triangular clr policy example:

    clr = CyclicLR(base_lr=0.001, max_lr=0.006,
                        step_size=2000., mode='exp_range',
                        gamma=0.99994)
    model.fit(X_train, Y_train, callbacks=[clr])

Results:

Alt text

Custom Cycle-Policy

This method is a triangular cycle that scales the cycle amplitude sinusoidally. This is an example of scaling on cycle.

Basic algorithm:

cycle = np.floor(1+iterations/(2*step_size))
x = np.abs(iterations/step_size - 2*cycle + 1)
lr= base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))*0.5*(1+np.sin(cycle*np.pi/2.))

Default custom cycle-policy example:

    clr_fn = lambda x: 0.5*(1+np.sin(x*np.pi/2.))
    clr = CyclicLR(base_lr=0.001, max_lr=0.006,
                        step_size=2000., scale_fn=clr_fn,
                        scale_mode='cycle')
    model.fit(X_train, Y_train, callbacks=[clr])

Results:

Alt text

Custom Iteration-Policy

This method is a triangular cycle that scales the cycle amplitude as a function of the cycle iterations. This is an example of scaling on iteration.

Basic algorithm:

cycle = np.floor(1+iterations/(2*step_size))
x = np.abs(iterations/step_size - 2*cycle + 1)
lr= base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))*1/(5**(iterations*0.0001))

Default custom cycle-policy example:

    clr_fn = lambda x: 1/(5**(x*0.0001))
    clr = CyclicLR(base_lr=0.001, max_lr=0.006,
                        step_size=2000., scale_fn=clr_fn,
                        scale_mode='iterations')
    model.fit(X_train, Y_train, callbacks=[clr])

Results:

Alt text

This result highlights one of the key differences between scaling on cycle vs scaling on iteration. When you scale on cycle, the absolute change in learning rate from one iteration to the next is always constant in a cycle. Scaling on iteration alters the absolute change at every iteration; in this particular case, the absolute change is monotonically decreasing. This results in the curvature between peaks.

Additional Information

Changing/resetting Cycle

During training, you may wish to adjust your cycle parameters:

clr._reset(new_base_lr,
           new_max_lr,
           new_step_size)

Calling _reset() allows you to start a new cycle w/ new parameters.

_reset() also sets the cycle iteration count to zero. If you are using a policy with dynamic amplitude scaling, this ensures the scaling function is reset.

If an argument is not not included in the function call, then the corresponding parameter is unchanged in the new cycle. As a consequence, calling

clr._reset()

simply resets the original cycle.

History

CyclicLR() keeps track of learning rates, loss, metrics and more in the history attribute dict. This generated many of the plots above.

Note: iterations in the history is the running training iterations; it is distinct from the cycle iterations and does not reset. This allows you to plot your learning rates over training iterations, even after you change/reset the cycle.

Example:

Alt text

Choosing a suitable base_lr/max_lr (LR Range Test)

The author offers a simple approach to determining the boundaries of your cycle by increasing the learning rate over a number of epochs and observing the results. They refer to this as an "LR range test."

An LR range test can be done using the triangular policy; simply set base_lr and max_lr to define the entire range you wish to test over, and set step_size to be the total number of iterations in the number of epochs you wish to test on. This linearly increases the learning rate at each iteration over the range desired.

The author suggests choosing base_lr and max_lr by plotting accuracy vs. learning rate. Choose base_lr to be the learning rate where accuracy starts to increase, and choose max_lr to be the learning rate where accuracy starts to slow, oscillate, or fall (the elbow). In the example above, Smith chose 0.001 and 0.006 as base_lr and max_lr respectively.

Plotting Accuracy vs. Learning Rate

In order to plot accuracy vs learning rate, you can use the .history attribute to get the learning rates and accuracy at each iteration.

model.fit(X, Y, callbacks=[clr])
h = clr.history
lr = h['lr']
acc = h['acc']

Order of learning rate augmentation

Note that the clr callback updates the learning rate prior to any further learning rate adjustments as called for in a given optimizer.

Functionality Test

clr_callback_tests.ipynb contains tests demonstrating desired behavior of optimizers.

clr's People

Contributors

bckenstler avatar carlthome avatar jeremyjordan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clr's Issues

Clarification for step_size?

From readme, "step_size : number of training iterations per half cycle. Authors suggest setting step_size = (2-8) x (training iterations in epoch) . Default 2000."
Does it mean step_size should be "np.ceil(x_train.shape[0]/batch_size/2)" or "2*np.ceil(x_train.shape[0]/batch_size)"?

CLR callback for R's keras

After using CLR for a bit in models written in Python, I must say CLR makes a huge difference in my work.

Now that R is well served by the keras package, I wonder if you could also write a CLR callback for the R Keras (see its API here)? That would help wonders people who for one reason or another have some models already prepared in R.

Thanks!

Order of learning rate augmentation

Note that the clr callback updates the learning rate prior to any further learning rate adjustments as called for in a given optimizer.

Hi, @bckenstler , excellent work! I am still confused about what you said about the "order of learning rate augmentation". If the clr callback is added and sets the learning rate after each batch training ends, will a given optimizer (e.g. adam) still adjust the learning rate that the clr just set for updating weights. Thanks!

Plotting range of Learning Rate

Hi,
Thank you so much for your work.
I want to plot the range of learning rate base_lr= 0.01, max_lr=0.1 for my method

schedulers = torch.optim.lr_scheduler.CyclicLR(optim_backbone, base_lr= 0.01, max_lr=0.1, step_size_up=2000, step_size_down=None, mode='triangular')

Like Figure 2(a) of this paper https://arxiv.org/pdf/1708.07120.pdf

FYI - Trapezoid schedule implementation is ready

Thanks to your CLR implementation, I forked and tailored another version for trapezoid schedule which is introduced in this paper:

This is just for your information, you can find it here: https://github.com/daisukelab/TrapezoidalLR

I was thinking I could ask for merge, but I just kept it as another version. I guess the trapezoid schedule might be a temporary solution though I implemented...

Strange Error

Hello,
I try to perform clr as described and it works very well with VGG16. But when training other networks like DenseNet I get following error (TypeError: integer argument expected, got float):


Epoch 1/10
2/91 [..............................] - ETA: 37:11 - loss: 0.3528 - acc: 0.8500

TypeError Traceback (most recent call last)
in ()
26 validation_steps = len(val_list) // batch_size + 1,
27 callbacks=[clr],
---> 28 verbose=1)

/opt/conda/lib/python3.6/site-packages/Keras-2.2.4-py3.6.egg/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your ' + object_name + ' call to the ' +
90 'Keras 2 API: ' + signature, stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper

/opt/conda/lib/python3.6/site-packages/Keras-2.2.4-py3.6.egg/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1434 use_multiprocessing=use_multiprocessing,
1435 shuffle=shuffle,
-> 1436 initial_epoch=initial_epoch)
1437
1438 @interfaces.legacy_generator_methods_support

/opt/conda/lib/python3.6/site-packages/Keras-2.2.4-py3.6.egg/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
217 batch_logs[l] = o
218
--> 219 callbacks._call_batch_hook('train', 'end', batch_index, batch_logs)
220
221 batch_index += 1

/opt/conda/lib/python3.6/site-packages/Keras-2.2.4-py3.6.egg/keras/callbacks.py in _call_batch_hook(self, mode, hook, batch, logs)
93 'Method (%s) is slow compared '
94 'to the batch update (%f). Check your callbacks.', hook_name,
---> 95 delta_t_median)
96 if hook == 'begin':
97 self._t_enter_batch = time.time()

TypeError: integer argument expected, got float

Has anybody an idea what is the reason for this error? Thank you!

Dima S.

Linking R implementation of CRL

Hi @bckenstler. This is great. Found your repo and figured the R implementation of keras could also greatly benefit from this. I literally translated your code into R and put it into a new package I plan to develop. Would you consider linking my repo somewhere at the top of your README so people who are looking for the R implementation could find it easily?

How to reset the lr cycle

Dear @bckenstler ,

recently I stumbled across an issue with CLR calling from keras 2.1.5:
I ran using

CyclicLR(base_lr=1e-5, max_lr=8e-4, mode='triangular2', step_size=trn_steps//10, scale_mode='iterations')

where trn_steps is equal to steps_per_epoch in model.fit_generator.

Now, my observation is that during the first epoch CLR goes through 10 cycles (as planned), but then lr stays constant throughout the remaining epochs. How do I properly reset the lr cycle? I tried scale_mode='cycle' as well, but no luck. What am I doing wrong?

AttributeError: 'CyclicLR' object has no attribute 'on_train_batch_begin'

Hi, I've tried to use your class within my training code, but I got the following error: AttributeError: 'CyclicLR' object has no attribute 'on_train_batch_begin'.

My code is the following:

from tensorflow.keras.applications.resnet50 import ResNet50

base_model = ResNet50(weights='imagenet', include_top=False)
base_model.trainable = False

model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(units=200, activation='relu'),
tf.keras.layers.Dense(units=5, activation="softmax")
])

model.compile(optimizer=tf.keras.optimizers.Adam(0.001), metrics=["accuracy"],
loss=tf.keras.losses.sparse_categorical_crossentropy)
clr = CyclicLR(base_lr=0.001, max_lr=0.006, step_size=240)
model.fit(train_dataset, epochs=30, steps_per_epoch=60, validation_data=val_dataset,
validation_steps=1, callbacks=[clr])

By the way train_dataset and val_dataset are tf.data.Datasets.

Versions: Python3 and TF 1.10.

Any idea why the issue?

On learning rate range test

Hi @bckenstler ,
Thanks a lot for sharing your implementation. I just read the paper on cyclical learning rate. I'd like to know how you dealt with the following

  1. How do you choose the number of epochs to run the model?
  2. When I run the LR range test, and estimate accuracy on validation set, I get an accuracy of 0.5 for almost all the learning rates. Since, the model is hardly trained, it is behaving as a random classifier. Then, the validation accuracy around 0.5 looks justified. But, this is not what the accuracy vs learning curve look like in the paper. How did you deal with this?
    Thanks

PR to keras

Hi, this callback seems quite interesting. Do you plan to PR to the keras repo?

May I use CLR for Adam optimizer?

From the paper and your implementation, your examples are only use SGD optimizer. I am wondering if I can use this CLR for Adam or other optimizers. Many thanks.

Have you considered submitting this to Pypi?

It's cool that you've implemented a cyclical learning rate for Keras, but have you considered adding this to Pypi? That way, it's a lot easier for others to incorporate CLR in their own repos

LR vs Accuracy

Hi,

I am trying out to plo LR vs Acc. However it is not showing the stable graph like it shows on the page.
Shown in a paper:
image

Its showing something like this:
image

Any suggestions ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.