Giter Site home page Giter Site logo

luziya / best-deep-learning-optimizers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lessw2020/best-deep-learning-optimizers

0.0 1.0 0.0 850 KB

Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, NLP suitable

License: Apache License 2.0

Python 12.75% Jupyter Notebook 87.25%

best-deep-learning-optimizers's Introduction

Best-Deep-Learning-Optimizers

Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, NLP suitable

Current top performers = Ranger with Gradient Centralization is the leader (April 11/2020) this is only on initial testing.

Updates - AdaHessian, the first 'it really works and works really well' second order optimizer added:

August 2020 - I tested AdaHessian last month on work datasets and it performed extremely well. It's like training with a guided missile compared to most other optimizers. The big caveat is you will need about 2x the normal GPU memory to run it vs running with a 'first order' optimizer. I am trying to get a Titan GPU with 24GB GPU memory just for this purpose atm.

new version of Ranger with highest accuracy to date for all optimizers tested: April 11 - New version of Ranger released (20.4.11), highest score for accuracy to date.

Ranger has been upgraded to use Gradient Centralization. See: https://arxiv.org/abs/2004.01461 and github: https://github.com/Yonghongwei/Gradient-Centralization

It will now use GC by default, and run it for both conv layers and fc layers. You can turn it on or off with "use_gc" at init to test out the difference on your datasets. (image from gc github).

The summary of gradient centralization: "GC can be viewed as a projected gradient descent method with a constrained loss function. The Lipschitzness of the constrained loss function and its gradient is better so that the training process becomes more efficient and stable."

Note - for optimal accuracy, make sure you use run with a flat lr for some time and then cosine descent the lr (72% - 28% descent), or if you don't have an lr framework... very comparable results by running at one rate for 75%, then stop and decrease lr, and run remaining 28%.

Usage - GC on by default but you can control all aspects at init:


Ranger will print settings at first init so you can confirm optimization is set the way you want it:


Future work: MARTHE, HyperAdam and other optimizers will be tested and posted if they look good.


12/27 - added DiffGrad, and unofficial version 1 support (coded from the paper).
12/28 - added Diff_RGrad = diffGrad + Rectified Adam to start off....seems to work quite well.

Medium article (summary and FastAI example usage): https://medium.com/@lessw/meet-diffgrad-new-deep-learning-optimizer-that-solves-adams-overshoot-issue-ec63e28e01b2

Official diffGrad paper: https://arxiv.org/abs/1909.11015v2

12/31 - AdaMod and DiffMod added. Initial SLS files added (but more work needed).

In Progress:

A - Parabolic Approximation Line Search: https://arxiv.org/abs/1903.11991v2

B - Stochastic Line Search (SLS): pending (needs param group support)

c - AvaGrad

General papers of relevance:

Does Adam stick close to the optimal point? https://arxiv.org/abs/1911.00289v1

Probabalistic line searches for stochastic optimization (2017, matlab only but good theory work): https://arxiv.org/abs/1703.10034v2

best-deep-learning-optimizers's People

Contributors

lessw2020 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.