fluxml / parameterschedulers.jl Goto Github PK

Common hyperparameter scheduling for ML

License: MIT License

Julia 100.00%

machine-learning flux optimization hyperparameters

parameterschedulers.jl's Introduction

ParameterSchedulers

ParameterSchedulers.jl provides common machine learning (ML) schedulers for hyper-parameters. Though this package is framework agnostic, a convenient interface for pairing schedules with Flux.jl optimizers is available. Using this package with Flux is as simple as:

using Flux, ParameterSchedulers
using ParameterSchedulers: Scheduler

opt = Scheduler(Momentum, Exp(start = 1e-2, decay = 0.8))

Available Schedules

This is a table of the common schedules implemented, but ParameterSchedulers provides utilities for creating more exotic schedules as well. The higher order schedules should make it so that you will rarely need to write a schedule from scratch.

You can read this paper for more information on the schedules below.

Schedule	Description	Type	Example
`Step(; start, decay, step_sizes)`	Exponential decay by `decay` every step in `step_sizes`	Decay	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = Step(start = 1.0, decay = 0.8, step_sizes = [2, 3, 2]) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`Exp(start, decay)`	Exponential decay by `decay` every iteration	Decay	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = Exp(start = 1.0, decay = 0.5) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`CosAnneal(;l0, l1, period)`	Cosine annealing	Cyclic	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = CosAnneal(l0 = 0.0, l1 = 1.0, period = 4) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`OneCycle(nsteps, maxval)`	One cycle cosine	Complex	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = OneCycle(10, 1.0) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`Triangle(l0, l1, period)`	Triangle wave function	Cyclic	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = Triangle(l0 = 0.0, l1 = 1.0, period = 2) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`TriangleDecay2(l0, l1, period)`	Triangle wave function with half the amplitude every `period`	Cyclic	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = TriangleDecay2(l0 = 0.0, l1 = 1.0, period = 2) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`TriangleExp(l0, l1, period, decay)`	Triangle wave function with exponential amplitude decay at rate `decay`	Cyclic	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = TriangleExp(l0 = 0.0, l1 = 1.0, period = 2, decay = 0.8) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`Poly(start, degree, max_iter)`	Polynomial decay at degree `degree`.	Decay	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = Poly(start = 1.0, degree = 2, max_iter = t[end]) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`Inv(start, decay, degree)`	Inverse decay at rate `(1 + t * decay)^degree`	Decay	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = Inv(start = 1.0, degree = 2, decay = 0.8) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`Sin(;l0, l1, period)`	Sine function	Cyclic	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = Sin(l0 = 0.0, l1 = 1.0, period = 2) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`SinDecay2(l0, l1, period)`	Sine function with half the amplitude every `period`	Cyclic	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = SinDecay2(l0 = 0.0, l1 = 1.0, period = 2) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`
`SinExp(l0, l1, period)`	Sine function with exponential amplitude decay at rate `decay`	Cyclic	`using UnicodePlots, ParameterSchedulers # hide t = 1:10 \|> collect # hide s = SinExp(l0 = 0.0, l1 = 1.0, period = 2, decay = 0.8) # hide lineplot(t, s.(t); width = 15, height = 3, border = :ascii, labels = false) # hide`

parameterschedulers.jl's People

Contributors

Stargazers

Watchers

Forkers

adinhobl carlolucibello zeta1999 darsnack a-cakir nikopj vnegi10 maximilian-gelbrecht vpuri3 josbert1 adarshpalaskar1

parameterschedulers.jl's Issues

Initialising optimisers with constant parameters.

I cannot find a way to set constant optimisers parameters together with Scheduler.
For example, I would like to setup a AdamW optimiser with exponentially decaying learning rate but also prescribe a constant decay of 1e-2.

I would expect this to work but obtain an error

julia> Optimisers.setup(Scheduler(AdamW, η=Exp(1e-3, 0.99), λ=1e-3), model)

ERROR: MethodError: objects of type Float64 are not callable
Maybe you forgot to use an operator such as *, ^, %, / etc. ?
Stacktrace:
  [1] (::ParameterSchedulers.var"#40#41"{Int64})(s::Float64)
    @ ParameterSchedulers ./none:0
  [2] iterate
    @ ./generator.jl:47 [inlined]
  [3] collect_to!
    @ ./array.jl:892 [inlined]
  [4] collect_to_with_first!
    @ ./array.jl:870 [inlined]
  [5] collect(itr::Base.Generator{@NamedTuple{η::Exp{Float64}, λ::Float64}, ParameterSchedulers.var"#40#41"{Int64}})
    @ Base ./array.jl:844
  [6] _totuple
    @ ./tuple.jl:425 [inlined]
  [7] Tuple
    @ ./tuple.jl:391 [inlined]
  [8] NamedTuple
    @ ./namedtuple.jl:149 [inlined]
  [9] _get_opt(scheduler::Scheduler{@NamedTuple{η::Exp{Float64}, λ::Float64}, typeof(AdamW)}, t::Int64)
    @ ParameterSchedulers ~/.julia/packages/ParameterSchedulers/ebjgq/src/scheduler.jl:46
 [10] init(o::Scheduler{@NamedTuple{η::Exp{Float64}, λ::Float64}, typeof(AdamW)}, x::Matrix{Float32})
    @ ParameterSchedulers ~/.julia/packages/ParameterSchedulers/ebjgq/src/scheduler.jl:51
 [11] #_setup#3
    @ ~/.julia/packages/Optimisers/ywGX8/src/interface.jl:40 [inlined]

I also tried wrapping λ=ParameterSchedulers.Constant(1e-3))
resulting in another error

ERROR: MethodError: Cannot `convert` an object of type Float64 to an object of type Tuple{Float64, Float64}

Closest candidates are:
  convert(::Type{T}, ::T) where T<:Tuple
   @ Base essentials.jl:456
  convert(::Type{T}, ::PyCall.PyObject) where T<:Tuple
   @ PyCall ~/.julia/packages/PyCall/1gn3u/src/conversions.jl:218
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:84
  ...

Stacktrace:
  [1] Adam(eta::Float64, beta::Float64, epsilon::Float64)
    @ Optimisers ~/.julia/packages/Optimisers/ywGX8/src/interface.jl:268

Calling λ decay or leaving out the arguments names completely also did not help.

ReduceLRonPlateau support

I wasn't sure how best to include such schedulers with the current API (especially with iterate). Idea is same as PyTorch ReduceLROnPlateau.

Compose schedules

It would be good to implement a compose function that allows combining different scheduling policies into one. The default rule can be multiplicative composition, but new policies can extend the function to define other forms of composition. The rules like SinExp can be defined by composing Exp(Sin(t)).

Docstring of `CosAnneal` incorrect?

Hi! Thank you for developing and maintaining this library!

I just noticed that the docstring of CosAnneal may be incorrect? Shouldn't it be

t̂ = restart ? mod(t - 1, period) : (t - 1)

instead of

t̂ = restart ? (t - 1) : mod(t - 1, period)

since the latter does not conform to the code whereas the former does?

Links broken in documentation

https://fluxml.ai/ParameterSchedulers.jl/stable/

All of the links to methods in the "Available Schedules" section 404.

UndefVarError: OADAM not defined

This is the result of trying using ParameterSchedulers

[ Info: Precompiling ParameterSchedulers [d7d3b36b-41b8-4d0d-a2bf-768c6151755e] ERROR: LoadError: LoadError: UndefVarError: OADAM not defined Stacktrace: [1] top-level scope at /Users/powers/.julia/packages/ParameterSchedulers/I8u0E/src/optimizers.jl:52 [2] include(::Function, ::Module, ::String) at ./Base.jl:380 [3] include at ./Base.jl:368 [inlined] [4] include(::String) at /Users/powers/.julia/packages/ParameterSchedulers/I8u0E/src/ParameterSchedulers.jl:1 [5] top-level scope at /Users/powers/.julia/packages/ParameterSchedulers/I8u0E/src/ParameterSchedulers.jl:15 [6] include(::Function, ::Module, ::String) at ./Base.jl:380 [7] include(::Module, ::String) at ./Base.jl:368 [8] top-level scope at none:2 [9] eval at ./boot.jl:331 [inlined] [10] eval(::Expr) at ./client.jl:467 [11] top-level scope at ./none:3 in expression starting at /Users/powers/.julia/packages/ParameterSchedulers/I8u0E/src/optimizers.jl:52 in expression starting at /Users/powers/.julia/packages/ParameterSchedulers/I8u0E/src/ParameterSchedulers.jl:15 ERROR: Failed to precompile ParameterSchedulers [d7d3b36b-41b8-4d0d-a2bf-768c6151755e] to /Users/powers/.julia/compiled/v1.5/ParameterSchedulers/MTUIq_8IwJI.ji. Stacktrace: [1] error(::String) at ./error.jl:33 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305 [3] _require(::Base.PkgId) at ./loading.jl:1030 [4] require(::Base.PkgId) at ./loading.jl:928 [5] require(::Module, ::Symbol) at ./loading.jl:923

It looks as though OADAM is not included with Flux anymore...

My Flux version is Flux v0.11.1 which is the latest I believe

Please take any useful ideas from my CyclicOptimisers

In working on my bachelor project, I have implemented a type CyclicOptimiser. It acts as a drop-in replacement for regular optimisers, by adding a new method to Flux.update!.

I can see that I have reproduced a lot of the work in here. So I simply wanted to offer up my implementation, and let you decide if I have had any ideas that might add value to this project.

using Flux
using Flux.Optimise: AbstractOptimiser
using Base.Iterators: Stateful, Cycle
import UnicodePlots
import Flux.update!  # Learning rate is updated each time Flux.update! is called, allowing seemless drop-in-replacement of normal optimisers with cyclic optimisers.
import Base: show
round3(x) = round(x, sigdigits=3)

function optimiser_to_string(opt::AbstractOptimiser)
    fldnms = fieldnames(typeof(opt))
    fields = getfield.([opt], fldnms)
    fieldtypes = typeof.(fields)
    output = string(typeof(opt)) * "("
    for i in eachindex(fields)
        if fieldtypes[i] <: IdDict
            output *= "..., "
        else
            fldnms[i] == :eta ? (output *= "$(fields[i]|>round3), ") : (output *= "$(fields[i]), ")
        end
    end
    output = output[begin:end-2] * ")"
    return output
end

"""
struct CycleMagnitude
    len::Int
    magfac::Float64
end

A type to be used as a functor with the purpose of 
calculating a magnitude that is changed discretely 
by a factor `magfac` each time `len` cycles are completed.
"""
struct CycleMagnitude
    len::Int
    magfac::Float64
end

"""
    (cyc::CycleMagnitude)(x) = cyc.magfac ^ (x÷cyc.len)

Compute a magnitude that is multiplied by `cyc.magfac` 
every time the input increases by cyc.len.

The input is intended to be the `taken` field of a 
Cycle(Stateful(my_collection)).

Note that for the actual calculation, the learning rate 
needs to be shifted so that the smallest value in the 
cycle is 0 before scaling, and shifted back up after scaling.
"""
(cyc::CycleMagnitude)(x) = cyc.magfac ^ (x÷cyc.len)


abstract type AbstractCycler end
struct TriangleCycler <: AbstractCycler
    cycle::Stateful{Cycle{A}} where {A<:AbstractVector}
end
show(io::IO, cyc::AbstractCycler) = println(io, "Cycler with values $(cyc.cycle.itr.xs).\nCycled $(cyc.cycle.taken) times")

cycle!(cycler::AbstractCycler) = popfirst!(cycler.cycle)

"""
    TriangleCycler(lower, upper, len)

Construct a TriangleCycler containing a set 
of `len` values values that goes from `lower` 
up to `upper` and back down again. Plotted against 
its index, the returned set looks like 
a triangle with 2 equal legs.

If the `len` is odd, the first and last point will 
be the same, causing repetition when cycled.
"""
function TriangleCycler(lower, upper, len)
    if len == 1  # Special case to avoid the error from range(a_number, another_number != a_number, length=1)
        cycle = [(lower+upper)/2]
    elseif iseven(len) 
        cycle = vcat(range(lower, upper; length=len÷2+1), reverse(range(lower, upper; length=len÷2+1))[begin+1:end-1])
    else
        cycle = vcat(range(lower, upper; length=len÷2+1), reverse(range(lower, upper; length=len÷2+1))[begin+1:end])
    end

    return TriangleCycler(cycle |> Cycle |> Stateful)
end
show(io::IO, tricy::TriangleCycler) = println(io, "TriangleCycler from $(minimum(tricy.cycle.itr.xs)|>round3) to $(maximum(tricy.cycle.itr.xs)|>round3) of cycle-length $(length(tricy.cycle.itr.xs))")

function check_optimiser(opt::AbstractOptimiser)
    hasfield(typeof(opt), :eta) || "Tried to construct a CyclicOptimiser with $(opt), which has no field eta (e.g. no learningrate parameter)." |> error
    opt isa DataType && "Tried to construct a CyclicOptimiser with an optimiser type (e.g. `Descent`). Try to use a concrete optimiser instead (e.g. `Descent()`)"|>error
    return nothing
end

"""
    struct CyclicOptimiser{T} <: AbstractOptimiser where {T<:AbstractOptimiser}
        current_optimiser::T
        learningrate::AbstractCycler
        cycle_magnitude::CycleMagnitude
    end
"""
struct CyclicOptimiser{T} <: AbstractOptimiser where {T<:AbstractOptimiser}
    current_optimiser::T
    learningrate::AbstractCycler
    cycle_magnitude::CycleMagnitude
    function CyclicOptimiser(opt, learningrate::AbstractCycler, cycmag::CycleMagnitude)
        check_optimiser(opt)
        @assert length(learningrate.cycle.itr.xs) == cycmag.len "Length og learningrate cycle does not match the length of the internal CycleMagnitude."
        return new{typeof(opt)}(opt, learningrate, cycmag)
    end
end


"""
CyclicOptimiser(opt::AbstractOptimiser, lower, upper, len; cycler::AbstractCycler=TriangleCycler, magfac=1)

Construct a CyclicOptimiser. The optimiser whose learning rate is cycled is 
`opt`, the first positional argument. `lower`, `upper` and `len` are passed on 
to `cycler`, constructing an `AbstractCycler` and defaulting to TriangleCycler.

A final keyword argument `magfac` sets the magnitude-controlling factor that 
is applied after a full cycle is completed. So if `magfac` is set to 0.5, then 
the span of the cycle is halved each cycle. The lower limit is pinned, 
so `magfac` only effects the upper limit, to ensure that the learningrate 
decreases each cycle (assuming magfac ≤ 1, which is checked for).
"""
function CyclicOptimiser(opt::AbstractOptimiser, lower, upper, len; cycler=TriangleCycler, magfac=1)
    check_optimiser(opt)
    return CyclicOptimiser(opt, cycler(lower, upper, len), CycleMagnitude(len, magfac))
end

function plot(cycopt::CyclicOptimiser, n_cycles=3)
    xs = 1:cycopt.cycle_magnitude.len*n_cycles
    cycopt = deepcopy(cycopt)
    Iterators.reset!(cycopt.learningrate.cycle)
    ys = [cycle!(cycopt.learningrate) for _ in eachindex(xs)] .* cycopt.cycle_magnitude.(xs)
    return UnicodePlots.scatterplot(xs, ys, xlabel="Iteration", ylabel="Learningrate",
    title="Learningrate for $n_cycles cycles")
end

function show(io::IO, cycopt::CyclicOptimiser)
    print(io, 
    """
    CyclicOptimiser with following properties:
    Current optimiser = $(cycopt.current_optimiser|>optimiser_to_string)
         Learningrate = $(typeof(cycopt.learningrate)) from $(cycopt.learningrate.cycle.itr.xs|>minimum|>round3) to $(cycopt.learningrate.cycle.itr.xs|>maximum|>round3)
          Cyclelength = $(cycopt.cycle_magnitude.len). Magfac = $(cycopt.cycle_magnitude.magfac)""")
end

function cycle!(co::CyclicOptimiser)
    A = co.cycle_magnitude(co.learningrate.cycle.taken)
    lower_bound = co.learningrate.cycle.itr.xs |> minimum
    co.current_optimiser.eta = A * (cycle!(co.learningrate) - lower_bound) + lower_bound
    return co.current_optimiser
end

Flux.update!(cycopt::CyclicOptimiser, xs::Params, gs) = Flux.update!(cycle!(cycopt), xs::Params, gs)

OneCycle annealing

Describe the potential feature

One cycle annealing (original paper here) is a really strong scheduler and what I've found to be the most optimal scheduler for deep learning. An implementation in PyTorch is here: https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.OneCycleLR.html

Motivation

No response

Possible Implementation

"Cosine anneal from `start` to `end` as pct goes from 0.0 to 1.0."
function annealing_cos(start, stop, pct)
    cos_out = cos(pi * pct) + 1
    return stop + (start - stop) / 2.0 * cos_out
end

You would typically start at pct 0.3, and go to pct=1.0 at the end of training.

Should complex schedules created with `Sequence` be automatically `Shortened`, if they have finite length?

Motivation and description

The generation of a sequence of finite-length schedules like, e.g.,

Sequence(1e-1 => 5, 5e-2 => 4, 3.4e-3 => 10)

would conceptually be a finite-length schedule as well. In this example the sequence would have a length of 19 iterations. In contrast the following sequence would be infinite:

Sequence(1e-1 => 5, 5e-2 => 4, 3.4e-3 => Inf)

However, both sequences behave identical AFAICS.

Moreover there is the concept of Shortened schedules, which throw a BoundsError if the end of the schedule is reached. Therefore it might be more intuitive if the first example would generate a Shortened schedule in contrast to the second example.

Possible Implementation

No response

`Flux.Optimise.update!` no longer works on `Scheduler`

Since the latest release, Flux.Optimise.update! no longer works on a Scheduler because the type of the generic optimization has been narrowed to `AbstractOptimiser in the latest release of Flux.

As an aside: it may be worth adding some basic integration tests to this repo to test the interoperability of ParameterSchedulers with Flux.

Dead links in Documentation

Some links in the documentation are not working. E.g.
On the welcome page: http://fluxml.ai/ParameterSchedulers.jl/dev/README.html

These links are dead:
http://fluxml.ai/ParameterSchedulers.jl/dev/docstrings/ParameterSchedulers.SinDecay2.html
http://fluxml.ai/ParameterSchedulers.jl/dev/docstrings/ParameterSchedulers.SinExp.html

Add Const scheduler

It can be useful to add a dummy scheduler giving a fixed learning rate to be used in a Sequence scheduler. See FluxML/Flux.jl#1815 for an application.

Right now a constant scheduler can be created with Exp(; λ, γ=1)

Scheduler with gradient clipping

Motivation and description

I think most implementations would require gradient clipping.
For example, Step (exp-decay with step size) with gradient clipping setting a lower bound of the learning rate.

These were common in previous versions of Flux.
So I think it would be very useful if such functionality is provided by default with keyword arguments.

If it is already implemented, I don't think the current documentation describes it well as I couldn't find it in the docs.

Possible Implementation

No response

Example of how to use with Optimisers.jl

Hi, I've been trying to use this package with Optimisers.jl (specifically, I've been trying to use a Step schedule with a Scheduler but I seem to be getting errors that suggest that this setup works with the Flux optimisers, and not with Optimisers.jl for now. Is there a way to write code that works with Optimisers.jl?

deprecated optmizers

This package should import the new symbols for the following optmizers:

WARNING: importing deprecated binding Flux.ADAM into ParameterSchedulers.
WARNING: Flux.ADAM is deprecated, use Adam instead.
  likely near /Users/carlo/.julia/packages/ParameterSchedulers/e1qwm/src/ParameterSchedulers.jl:82
WARNING: importing deprecated binding Flux.RADAM into ParameterSchedulers.
WARNING: Flux.RADAM is deprecated, use RAdam instead.
  likely near /Users/carlo/.julia/packages/ParameterSchedulers/e1qwm/src/ParameterSchedulers.jl:82
WARNING: importing deprecated binding Flux.OADAM into ParameterSchedulers.
WARNING: Flux.OADAM is deprecated, use OAdam instead.
  likely near /Users/carlo/.julia/packages/ParameterSchedulers/e1qwm/src/ParameterSchedulers.jl:82
WARNING: importing deprecated binding Flux.ADAGrad into ParameterSchedulers.
WARNING: Flux.ADAGrad is deprecated, use AdaGrad instead.
  likely near /Users/carlo/.julia/packages/ParameterSchedulers/e1qwm/src/ParameterSchedulers.jl:82
WARNING: importing deprecated binding Flux.ADADelta into ParameterSchedulers.
WARNING: Flux.ADADelta is deprecated, use AdaDelta instead.
  likely near /Users/carlo/.julia/packages/ParameterSchedulers/e1qwm/src/ParameterSchedulers.jl:82
WARNING: importing deprecated binding Flux.NADAM into ParameterSchedulers.
WARNING: Flux.NADAM is deprecated, use NAdam instead.
  likely near /Users/carlo/.julia/packages/ParameterSchedulers/e1qwm/src/ParameterSchedulers.jl:82

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

change Tri to Triangle?

seems a bit too trimmed

Add more testing for `ComposedSchedule`

ComposedSchedule is only implicitly tested through the cyclic schedules. This probably leaves out edge cases.

Support ProgressLogging.jl?

To reproduce:

using ParameterSchedulers
using ProgressLogging

s = Exp(λ=1e-1, γ=0.9)
@progress for (η, i) in zip(s, 1:100)
       @show η, i
end

ERROR:

ERROR: MethodError: no method matching size(::Exp{Float64})
Closest candidates are:
  size(::Tables.EmptyVector) at /home/jagupt/.julia/packages/Tables/8FVkV/src/fallbacks.jl:183
  size(::LLVM.ConstantAggregateZero) at /home/jagupt/.julia/packages/LLVM/7Q46C/src/core/value/constant.jl:115
  size(::CUDA.CUDNN.FilterDesc) at /home/jagupt/.julia/packages/CUDA/BIYoG/lib/cudnn/filter.jl:33
  ...
Stacktrace:
 [1] axes at ./abstractarray.jl:75 [inlined]
 [2] MappingRF at ./reduce.jl:93 [inlined]
 [3] afoldl(::Base.MappingRF{typeof(axes),Base.BottomRF{typeof(Base.Iterators._zip_promote_shape)}}, ::Base._InitialValue, ::Exp{Float64}, ::UnitRange{Int64}) at ./operators.jl:526
 [4] _foldl_impl(::Base.MappingRF{typeof(axes),Base.BottomRF{typeof(Base.Iterators._zip_promote_shape)}}, ::Base._InitialValue, ::Tuple{Exp{Float64},UnitRange{Int64}}) at ./tuple.jl:207
 [5] foldl_impl(::Base.MappingRF{typeof(axes),Base.BottomRF{typeof(Base.Iterators._zip_promote_shape)}}, ::NamedTuple{(),Tuple{}}, ::Tuple{Exp{Float64},UnitRange{Int64}}) at ./reduce.jl:48
 [6] mapfoldl_impl(::typeof(axes), ::typeof(Base.Iterators._zip_promote_shape), ::NamedTuple{(),Tuple{}}, ::Tuple{Exp{Float64},UnitRange{Int64}}) at ./reduce.jl:44
 [7] mapfoldl(::Function, ::Function, ::Tuple{Exp{Float64},UnitRange{Int64}}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ./reduce.jl:160
 [8] mapfoldl at ./reduce.jl:160 [inlined]
 [9] #mapreduce#208 at ./reduce.jl:287 [inlined]
 [10] mapreduce(::Function, ::Function, ::Tuple{Exp{Float64},UnitRange{Int64}}) at ./reduce.jl:287
 [11] axes(::Base.Iterators.Zip{Tuple{Exp{Float64},UnitRange{Int64}}}) at ./iterators.jl:317
 [12] _linindex(::Base.Iterators.Zip{Tuple{Exp{Float64},UnitRange{Int64}}}) at /home/jagupt/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:510
 [13] (::Base.var"#62#63"{typeof(first),typeof(ProgressLogging._linindex)})(::Base.Iterators.Zip{Tuple{Exp{Float64},UnitRange{Int64}}}) at ./operators.jl:875
 [14] map(::Base.var"#62#63"{typeof(first),typeof(ProgressLogging._linindex)}, ::Tuple{Base.Iterators.Zip{Tuple{Exp{Float64},UnitRange{Int64}}}}) at ./tuple.jl:157
 [15] make_count_to_frac(::Base.Iterators.Zip{Tuple{Exp{Float64},UnitRange{Int64}}}) at /home/jagupt/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:499
 [16] top-level scope at /home/jagupt/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:464

`ComposedSchedule` example for `CosAnneal` in the docs is incorrect

The Cosine annealing variants example shown on the cheatsheet page leads to the following error below.

julia> s = ComposedSchedule(CosAnneal(range, offset, period),
                            (Step(range, m_mul, period), offset, period))
ERROR: MethodError: no method matching CosAnneal(::typeof(range), ::Float64, ::Int64)
Closest candidates are:
  CosAnneal(::T, ::T, ::S, ::Bool) where {T, S<:Integer} at ~/.julia/packages/ParameterSchedulers/CTkAS/src/cyclic.jl:212
Stacktrace:
 [1] top-level scope
   @ REPL[47]:1
 [2] top-level scope
   @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52

You can create a ComposedSchedule by adding the Bool flag when defining the CosAnneal, however, that leads to error when you try to get a learning rate down the line

(this example uses Exp(r, γ) as the schedule)

julia> s = ComposedSchedule(CosAnneal(r, offset, period, true), (Exp(r, γ), offset, period))
ComposedSchedule(CosAnneal{Float64, Int64}, (Exp{Float64}(1.0, 0.95), Constant{Float64}(0.0), Constant{Int64}(5)))

julia> s(1)
ERROR: MethodError: no method matching CosAnneal{Float64, Int64}(::Float64, ::Float64, ::Int64)
Closest candidates are:
  CosAnneal{T, S}(::Any, ::Any, ::Any, ::Any) where {T, S<:Integer} at ~/.julia/packages/ParameterSchedulers/CTkAS/src/cyclic.jl:212
Stacktrace:
 [1] (::ParameterSchedulers.var"#30#31"{CosAnneal{Float64, Int64}})(s::CosAnneal{Float64, Int64}, ps::Tuple{Float64, Float64, Int64})
   @ ParameterSchedulers ~/ParameterSchedulers.jl/src/complex.jl:212
 [2] (::ComposedSchedule{CosAnneal{Float64, Int64}, Tuple{Exp{Float64}, ParameterSchedulers.Constant{Float64}, ParameterSchedulers.Constant{Int64}}, ParameterSchedulers.var"#30#31"{CosAnneal{Float64, Int64}}})(t::Int64)
   @ ParameterSchedulers ~/.julia/packages/ParameterSchedulers/CTkAS/src/complex.jl:227
 [3] top-level scope
   @ REPL[50]:1
 [4] top-level scope
   @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52

This can be remedied by passing the bool flag to the parameter list at the end:

julia> s = ComposedSchedule(CosAnneal(r, offset, period, true), (Exp(r, γ), offset, period, **true**))
ComposedSchedule(CosAnneal{Float64, Int64}, (Exp{Float64}(1.0, 0.95), Constant{Float64}(0.0), Constant{Int64}(5), Constant{Bool}(true)))

julia> s(1)
1.0

Would be good to correct the documentation and/or the code. Not sure what the intended behavior is/should be

Add `start`/`decay` as Kwargs in Schedulers (apart from the UTF-8 ones)

Describe the potential feature

When writing code from terminal-based editors e.g. Vim, NeoVim, Nano (and AFAIK Helix), it becomes a bit of a cumbersome to repeatedly copy-paste mathematical characters such as λ and γ which appear in most schedulers. Although I see the merit of having UTF-8 characters as kwargs for mathematical aesthetics, I believe we should also have the option to specify such parameters by spelling out their meanings i.e. with start, decay, etc.

As an example, currently we have the following:

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.3 (2022-05-06)
 _/ |\__'_|_|_|\__'_|  |  Fedora 36 build
|__/                   |

julia> using ParameterSchedulers

julia> s = Exp(; start = 0.01, decay = 0.95)
ERROR: UndefKeywordError: keyword argument λ not assigned
Stacktrace:
 [1] top-level scope
   @ REPL[2]:1
 [2] top-level scope
   @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52

Motivation

No response

Possible Implementation

I believe that the implementation should be straight forward, just requiring another definition of the schedulers using the start and decay kwargs (which are already in the "Arguments" section of some of the schedulers doc pages). Using the Step scheduler as an example, we'd have:

Step(;λ, γ, step_sizes) = Step(λ, γ, step_sizes) # current implementation
Step(;start, decay, step_sizes) = Step(start, decay, step_sizes)

Using a wrapper + mutation to "implicitly" update scheduled parameters?

This is to write down a thought which came from #34 and FluxML/Optimisers.jl#89. Presently, we rely on mutably/immutably updating any objects which depend on the schedule value after each step. This is simple and easy to understand, but it could get unwieldy with more complex optimizer state trees.

What if we instead created a stateful type or wrapper which keeps track of the current schedule value? Then, we make this or some type which contains a reference to it subclass a number type (maybe Real? Could make it parametric on the value type). This proxy number can then be manipulated directly by Optimisers.jl rules, but will appear to update automatically whenever the schedule is ticked.

Some pseudocode for the above:

Option 1: wrapper itself is mutable number proxy

mutable struct ScheduleValue{T<:Real} <: Real
  inner::T
end

# Overload basic math operations (much like Flux.Nil)
Base.:+(sv::ScheduleValue, x::Number) = sv.inner + x
....

eta = ScheduleValue(0f0)
d = Descent(eta)
schedule = Exp(...)

for s in schedule
  eta.inner = s  # probably want a proper function for this
  ...
end

Option 2: number proxy is derived from wrapper

struct ScheduleValue{S<:Stateful} <: Real
  iter::S
end

_getval(sv::ScheduleValue) = sv.iter.schedule(sv.iter.state)

# Overload basic math operations (much like Flux.Nil)
Base.:+(sv::ScheduleValue, x::Number) = _getval(sv.inner) + x
...

schedule = Stateful(Exp(...))

eta = ScheduleValue(schedule)
d = Descent(eta)

for _ in schedule  # no need for value here, just next! on the Stateful
  ...
end

Too magic? Perhaps. I could also see serialization being an issue because of the mutable references, but BSON/JLD2 at least should work. However, this does seem more ergonomic than wrapping optimization rules when it comes to scheduling multiple hyperparameters simultaneously.

Add PyTorch cheatsheet section to docs

ParameterSchedulers.jl supports most (all?) of the schedules in PyTorch, but we don't use the same naming, and we push for composition over highly specific schedules. We should add a "PyTorch" cheatsheet section to the docs that is a how-to guide for recreating PyTorch schedulers in ParameterSchedulers. For example, we could show how ReduceLROnPlateau can be created with Stateful and Flux.plateau.

Remove Flux Dependency

The package can be used for all kinds of ML libraries not only Flux. I use it with Lux for example as well. However, it has Flux.jl as a dependency that it will always install. From a quick look over the code the dependency on Flux is only relevant for the Schedulers struct.

If the dependency there is really necessary, one could rewrite this as an extension module that is only loaded when Flux is loaded as well.

Remove Flux Dependency

Motivation and description

Having Flux as a dependency seems unnessesary and can cause some package version conflict headaches.
Optimisers.jl already has a system to adjust hyperparameters (like lr) via Optimisers.adjust! (see https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.adjust!). Can we remove the dependency and associated functions in favor of telling people to just use adjust!?

Possible Implementation

For example, the following works:

model = ... # init a Flux or Lux model
opt = Adam(lr)
sched = Step(lr, 0.95, step)
st_opt = Optimisers.setup(opt, model)

for ep in 1:epochs
    Optimisers.adjust!(st_opt, sched(ep))
    # loop through data, apply gradients, update params, etc
    ...
end

Alternatively

Flux could be moved to a weak dependency and the Scheduler code could be moved to an extension.

fluxml / parameterschedulers.jl Goto Github PK

parameterschedulers.jl's Introduction

ParameterSchedulers

Available Schedules

parameterschedulers.jl's People

Contributors

Stargazers

Watchers

Forkers

parameterschedulers.jl's Issues

Describe the potential feature

Motivation

Possible Implementation

Motivation and description

Possible Implementation

Motivation and description

Possible Implementation

Describe the potential feature

Motivation

Possible Implementation

Option 1: wrapper itself is mutable number proxy

Option 2: number proxy is derived from wrapper

Motivation and description

Possible Implementation

Alternatively

Recommend Projects

Recommend Topics

Recommend Org