juliadecisionfocusedlearning / implicitdifferentiation.jl Goto Github PK

View Code? Open in Web Editor NEW

122.0 5.0 7.0 1.73 MB

Automatic differentiation of implicit functions

Home Page: https://juliadecisionfocusedlearning.github.io/ImplicitDifferentiation.jl/

License: MIT License

Julia 100.00%

julia automatic-differentiation machine-learning autodiff implicit optimization

implicitdifferentiation.jl's Introduction

ImplicitDifferentiation.jl

ImplicitDifferentiation.jl is a package for automatic differentiation of functions defined implicitly, i.e., forward mappings

$$x \in \mathbb{R}^n \longmapsto y(x) \in \mathbb{R}^m$$

whose output is defined by conditions

$$c(x,y(x)) = 0 \in \mathbb{R}^m$$

Background

Implicit differentiation is useful to differentiate through two types of functions:

Those for which automatic differentiation fails. Reasons can vary depending on your backend, but the most common include calls to external solvers, mutating operations or type restrictions.
Those for which automatic differentiation is very slow. A common example is iterative procedures like fixed point equations or optimization algorithms.

If you just need a quick overview, check out our JuliaCon 2022 talk. If you want a deeper dive into the theory, you can refer to the paper Efficient and modular implicit differentiation by Blondel et al. (2022).

Getting started

To install the stable version, open a Julia REPL and run:

using Pkg; Pkg.add("ImplicitDifferentiation")

For the latest version, run this instead:

using Pkg; Pkg.add(url="https://github.com/JuliaDecisionFocusedLearning/ImplicitDifferentiation.jl")

Please read the documentation, especially the examples and FAQ.

Related projects

In Julia:

SciML ecosystem, especially LinearSolve.jl, NonlinearSolve.jl and Optimization.jl
jump-dev/DiffOpt.jl: differentiation of convex optimization problems
axelparmentier/InferOpt.jl: approximate differentiation of combinatorial optimization problems
JuliaNonconvex/NonconvexUtils.jl: contains the original implementation from which this package drew inspiration

In Python:

google/jaxopt: hardware accelerated, batchable and differentiable optimizers in JAX

implicitdifferentiation.jl's People

Contributors

Stargazers

Watchers

Forkers

adrhill yingboma baggepinnen jinraekim thorek1 pitmonticone sfalmo

implicitdifferentiation.jl's Issues

Typo in docs. `min` should be `argmin`

In: https://gdalle.github.io/ImplicitDifferentiation.jl/dev/examples/1_unconstrained_optimization/#
it says:
I think min should be argmin

More examples

We should add more illustrations of what implicit differentation can do.

A few ideas:

Constrained optimization (projected / proximal gradient, Frank-Wolfe)
Optimal transport
ODEs

Return the output only and not the byproduct

I think this line should return y only not both y and z.

https://github.com/gdalle/ImplicitDifferentiation.jl/blob/58fa014e9f88175a0490987439a5c5fcada0b546/src/implicit_function.jl#L41

Support second order

With #135 the second order test was removed. Maybe it needs https://github.com/JuliaSmoothOptimizers/DiffKrylov.jl

Speed up ForwardDiffExt with batched solve

See #71 (comment) by @thorek1

Test static arrays and use direct linear solver

I think we should add tests for static array inputs and outputs. Ideally we should remain static throughout the whole computation. Also a nice heuristic could be to use a direct linear solver whenever static arrays are used instead of gmres.

Issue with second derivative

Hi,
I have been trying to use ImplicitDifferentiation.jl for higher order jvp but I was not lucky.

I want to use it for BifurcationKit.jl, where I need 3rd order jvp

Basic example:

using ImplicitDifferentiation
using Optim
using Random
using Zygote


Random.seed!(63)

function dumb_identity(x)
    f(y) = sum(abs2, y-x)
    y0 = zero(x)
    res = optimize(f, y0, LBFGS(); autodiff=:forward)
    y = Optim.minimizer(res)
    return y
end;

zero_gradient(x, y) = 2(y - x);
implicit = ImplicitFunction(dumb_identity, zero_gradient);
x = rand(3, 2)
h = rand(3, 2)
D(x,h) = Zygote.jacobian(t->implicit(x .+ t .* h), 0)[1]
D(x,h) # works
D2(x,h1,h2) = Zygote.jacobian(t->D(x .+ t .* h2,h1), 0)[1]
D2(x,h,h) # does not work

I also have this code with ForwardDiff. Not sure the problem is the same

using ForwardDiffChainRules, ForwardDiff
@ForwardDiff_frule (f::typeof(implicit))(x::AbstractMatrix{<:ForwardDiff.Dual})
D(x,h) = ForwardDiff.derivative(t->implicit(x .+ t .* h), 0)
D(x,h) # works
D2(x,h1,h2) = ForwardDiff.derivative(t->D(x .+ t .* h2,h1), 0)
D2(x,h,h) # does not work

Type instabilities?

See this Discourse thread: https://discourse.julialang.org/t/custom-rule-for-differentiating-through-newton-solver-using-forwarddiff-works-for-gradient-fails-for-hessian/93002/20

please register

While I understand it is WIP, this package is already useful, nicely written and documented, so please consider registering it.

Add `JET.test_package` to test suite

https://aviatesk.github.io/JET.jl/stable/jetanalysis/#JET.test_package

Type inference in linear solve

At the moment, since IterativeSolvers can return a convergence history depending on a keyword argument, our pushforward and pullback are type-unstable.

If this isn't fixed by #5, we can consider annotations like

linear_solver(A, b)::AbstractVector = gmres(A, b)

u::AbstractVector = linear_solver(A, b)

Add Aqua and JET badges to README

Aqua is done

JET requires fixing #72 and #73

Fix type stability tests in forward mode

At the moment, JET throws a strange error "failed to optimize due to recursion" in forward mode. Not sure how to solve it

Support for sparse arrays

At the moment, sparse arrays would be densified when turned to vectors. It would be nice to be agnostic to the array type
ping @mfherbst

Package benchmarks

https://github.com/JuliaCI/PkgBenchmark.jl

Complex-valued arguments

Currently, the implicit function is defined for real arguments only

function (implicit::ImplicitFunction)(
    x_and_dx::AbstractArray{Dual{T,R,N}}; kwargs...
) where {T,R,N}

It would be useful to have it work for complex-valued arguments as well. Naively defining the same function with modified type signature did not seem to work.

Document custom rrule trick for ComponentArray constructor

Regarding #23, I was trying to re-implement this
in Julia.

I found that the use of ComponentArrays may not work properly with Zygote when the optimization parameters are given from another source (here, a neural network) as the following (with v0.5.0-DEV).

I think this is related to jonniedie/ComponentArrays.jl#176.
Therefore, it may be better to clarify the limitation of ComponentArrays for multiple arguments cases (or, just avoid this for now).

Case 1: with ComponentArrays (not working)

Code

using ImplicitDifferentiation
using Flux
using Convex, ECOS
using ComponentArrays
using Random


Random.seed!(2024)
i_max = 10
n = 3
m = 2
model = Chain(
              Dense(3, 64, Flux.leakyrelu),
              Dense(64, i_max*(m+1)),
             )
u_min = -2*ones(m)
u_max = +2*ones(m)


function minimise_logsumexp(θ; T, u_min, u_max, initial_guess, solver=() -> ECOS.Optimizer())
    # A = θ[:, 1:end-1]
    # B = θ[:, end]
    (; A, B) = θ
    m = size(A)[2]
    u = Convex.Variable(m)
    if initial_guess != nothing
        u.value = initial_guess
    end
    obj = T * Convex.logsumexp((1/T)*(A*u + B))
    prob = Convex.minimize(obj)
    prob.constraints += [u >= u_min]
    prob.constraints += [u <= u_max]
    solve!(prob, solver(), silent_solver=true, verbose=false)
    minimiser = typeof(u.value) <: Number ? [u.value] : u.value[:]  # to make it a vector
    return minimiser
end


function forward_cstr_optim(θ; kwargs...)
    u = minimise_logsumexp(θ; kwargs...)
    return u
end


function proj_hypercube(p; u_min, u_max)
    return max.(u_min, min.(u_max, p))
end

function conditions_cstr_optim(θ, u; kwargs...)
    (; A, B) = θ
    # A = θ[:, 1:end-1]
    # B = θ[:, end]
    ∇₂f = A' * Flux.softmax(A*u+B)
    η = 0.1
    return u .- proj_hypercube(u .- η .* ∇₂f; kwargs...)
end


function implicit_cstr_optim_func(θ; T, u_min, u_max, initial_guess, solver)
    tmp = ImplicitFunction(
                           θ -> forward_cstr_optim(θ; T, u_min, u_max, initial_guess, solver),
                           (θ, u) -> conditions_cstr_optim(θ, u; u_min, u_max),
                          )
    tmp(θ)
end

x = 100*(2*rand(n) .- 1)
val, grads = Flux.withgradient(model) do _model
    AB = reshape(_model(x), i_max, m+1)
    A = AB[:, 1:m]
    B = AB[:, m+1]
    θ = ComponentArray(A=A, B=B)
    # θ = reshape(_model(x), i_max, m+1)
    u_star = implicit_cstr_optim_func(θ; T=1, u_min=-ones(m), u_max=+ones(m), initial_guess=nothing, solver=() -> ECOS.Optimizer())
    sum(u_star)
end

Result

ERROR: Mutating arrays is not supported -- called push!(Vector{Any}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/latest/limitations

Case 2: without ComponentArrays (working)

Code

using ImplicitDifferentiation
using Flux
using Convex, ECOS
using ComponentArrays
using Random


Random.seed!(2024)
i_max = 10
n = 3
m = 2
model = Chain(
              Dense(3, 64, Flux.leakyrelu),
              Dense(64, i_max*(m+1)),
             )
u_min = -2*ones(m)
u_max = +2*ones(m)


function minimise_logsumexp(θ; T, u_min, u_max, initial_guess, solver=() -> ECOS.Optimizer())
    # A = θ[:, 1:end-1]
    # B = θ[:, end]
    (; A, B) = θ
    m = size(A)[2]
    u = Convex.Variable(m)
    if initial_guess != nothing
        u.value = initial_guess
    end
    obj = T * Convex.logsumexp((1/T)*(A*u + B))
    prob = Convex.minimize(obj)
    prob.constraints += [u >= u_min]
    prob.constraints += [u <= u_max]
    solve!(prob, solver(), silent_solver=true, verbose=false)
    minimiser = typeof(u.value) <: Number ? [u.value] : u.value[:]  # to make it a vector
    return minimiser
end


function forward_cstr_optim(θ; kwargs...)
    u = minimise_logsumexp(θ; kwargs...)
    return u
end


function proj_hypercube(p; u_min, u_max)
    return max.(u_min, min.(u_max, p))
end

function conditions_cstr_optim(θ, u; kwargs...)
    # (; A, B) = θ
    A = θ[:, 1:end-1]
    B = θ[:, end]
    ∇₂f = A' * Flux.softmax(A*u+B)
    η = 0.1
    return u .- proj_hypercube(u .- η .* ∇₂f; kwargs...)
end


function implicit_cstr_optim_func(θ; T, u_min, u_max, initial_guess, solver)
    tmp = ImplicitFunction(
                           θ -> forward_cstr_optim(θ; T, u_min, u_max, initial_guess, solver),
                           (θ, u) -> conditions_cstr_optim(θ, u; u_min, u_max),
                          )
    tmp(θ)
end

x = 100*(2*rand(n) .- 1)
val, grads = Flux.withgradient(model) do _model
    AB = reshape(_model(x), i_max, m+1)
    # A = AB[:, 1:m]
    # B = AB[:, m+1]
    # θ = ComponentArray(A=A, B=B)
    θ = reshape(_model(x), i_max, m+1)
    u_star = implicit_cstr_optim_func(θ; T=1, u_min=-ones(m), u_max=+ones(m), initial_guess=nothing, solver=() -> ECOS.Optimizer())
    sum(u_star)
end

Result

(val = 1.2111089906277515, grad = ((layers = ((weight = Float32[-1.9894879 0.016161848 -1.9257156; -0.009999948 8.12358f-5 -0.009679403; … ; 0.31266153 -0.0025399441 0.30263928; -1.2044753 0.009784702 -1.1658663], bias = Float32[-0.026939616, -0.0001354091, 0.0067554815, -0.012562656, -0.0116912015, -0.0072106766, -0.019949805, -0.012255933, -0.008887393, -0.00019493952  …  8.364284f-6, 0.0029845645, 0.00020423913, -0.00016008505, -0.00011763882, 0.01743382, 0.015867122, 0.013238616, 0.0042337435, -0.016309775], σ = nothing), (weight = Float32[2.7479534f-31 -8.035564f-33 … 6.655762f-31 8.295855f-31; 4.978378f-17 -1.455777f-18 … 1.2058026f-16 1.502933f-16; … ; 5.054912f-10 -1.478157f-11 … 1.2243399f-9 1.5260381f-9; 3.615537f-16 -1.0572551f-17 … 8.757118f-16 1.0915021f-15], bias = Float32[2.8529254f-32, 5.168552f-18, 1.702376f-20, -0.011397834, 0.012743557, 3.4305703f-10, 2.0157113f-23, -0.010961587, 9.4864516f-12, 7.707587f-18  …  1.4720016f-31, 2.3545986f-17, 7.705814f-20, -0.035908565, 0.057037998, 1.6192033f-9, 9.8762665f-23, -0.021129435, 5.2480097f-11, 3.7536507f-17], σ = nothing)),),))

Compatibility with scalar functions

At the moment our implementation fails for scalar functions, even though they should be a special case of vector. Need to investigate why

Basic question on square root example

Hi there,

Nice package! I'm wondering if someone could suggest how to use this package to compute the derivative of a function that computes the square roots of a number by using a solver. I asked this question on discourse here.

Thanks

Switch to LinearSolve

Replace IterativeSolvers with LinearSolve to give users more choice among linear equation solvers.

Add precompilation workflow

Seems difficult since the base package doesn't have access to any autodiff backend. My best guess would be to put it in the extensions.

Direct linear solver not efficiently handled

https://github.com/gdalle/ImplicitDifferentiation.jl/blob/7f9fa8fd5b3851fd55421501b1f061f784c68ab0/ext/ImplicitDifferentiationChainRulesExt.jl#L63

Currently to use a direct linear solver, I need to do pass in (A, b) -> (Matrix(A) \ b, (solved=true,)). There are 2 problems with that. First it is inconvenient to pass it in like this. Second, it is inefficient because it will call Matrix(A) and factorise it every time the pullback function is called. Calling Matrix(A) and factorising it can be expensive and is only needed to be done once and then reused for every pullback call. This currently makes Jacobian computations with direct solvers much slower than necessary.

Fully mutating linear solver

At the moment, AbstractDifferentiation does not have a fully mutating API, and benchmarks show allocations are non-negligible when multiplying by A and B. Opening this to track JuliaDiff/AbstractDifferentiation.jl#14 and see what we can do

ForwardDiff compatibility

At the moment, autodiff of implicit functions relies on ChainRules, which means ForwardDiff won't work.

Suggested fix: manually overload the callable on ForwardDiff.Dual

Compatibility with Enzyme

As of today, Enzyme supports custom rules, which means we could make ImplicitDifferentation compatible with this promising AD backend

QR decomposition

Hi! First, have I already said this package is really cool? If not, it is. Really exciting stuff.

I thought I would try to use the to get rules for a QR decomposition. This is very complex with the usual approach (see JuliaDiff/ChainRules.jl#651 for lots of linked PRs and issues). But the invariant satisfied by QR is pretty clean, so I thought it could be a fun test run for this package. I don't have any pressing application at this point - this is just a test run.

So here's my first shot at it:

using LinearAlgebra, ImplicitDifferentiation, ChainRules

function qr_constraints(x,QR)
    Q, R = QR

    mynorm(x) = sum(xj -> xj^2, x)
    mynorm(Q' * Q - I) + mynorm(Q * R - x)
end

myqr = let
    function primal(x)
        Q,R = qr(x)
        Matrix(Q), R
    end
    ImplicitFunction(primal, qr_constraints)
end

But if I try this out, say with

using Zygote

function my_test_function(x)
    Q,R = myqr(x)
    sum(R)
end

I get

julia> x = randn(3,2)
3×2 Matrix{Float64}:
 0.380649  -0.540067
 0.756046  -1.30804
 1.07281   -0.514084

julia> Zygote.withgradient(my_test_function, x)
ERROR: MethodError: no method matching vec(::ChainRulesCore.Tangent{Any, Tuple{ChainRulesCore.ZeroTangent, FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}})
Closest candidates are:
  vec(::Transpose{<:Number, <:AbstractVector}) at ~/julia/julia-1.8.0/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:219
  vec(::Adjoint{<:Real, <:AbstractVector}) at ~/julia/julia-1.8.0/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:220
  vec(::SparseArrays.AbstractSparseVector) at ~/julia/julia-1.8.0/share/julia/stdlib/v1.8/SparseArrays/src/sparsevector.jl:965
  ...
Stacktrace:
 [1] (::ImplicitDifferentiation.var"#implicit_pullback#11"{Float64, Matrix{Float64}, LinearOperators.LinearOperator{Float64, Int64, ImplicitDifferentiation.var"#mul_Bᵀ!#10"{ComposedFunction{typeof(last), Zygote.var"#ad_pullback#50"{Tuple{ImplicitDifferentiation.var"#conditions_x#7"{Tuple{Matrix{Float64}, Matrix{Float64}}, typeof(qr_constraints)}, Matrix{Float64}}, typeof(∂(λ))}}, Tuple{Matrix{Float64}, Matrix{Float64}}}, Nothing, Nothing, Vector{Float64}}, LinearOperators.LinearOperator{Float64, Int64, ImplicitDifferentiation.var"#mul_Aᵀ!#9"{ComposedFunction{typeof(last), Zygote.var"#ad_pullback#50"{Tuple{ImplicitDifferentiation.var"#conditions_y#8"{Matrix{Float64}, typeof(qr_constraints)}, Tuple{Matrix{Float64}, Matrix{Float64}}}, typeof(∂(λ))}}, Tuple{Matrix{Float64}, Matrix{Float64}}}, Nothing, Nothing, Vector{Float64}}, typeof(Krylov.gmres)})(dy::ChainRulesCore.Tangent{Any, Tuple{ChainRulesCore.ZeroTangent, FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}})
   @ ImplicitDifferentiation ~/.julia/packages/ImplicitDifferentiation/90Hcu/src/implicit_function.jl:107
 [2] ZBack
   @ ~/.julia/packages/Zygote/PD12J/src/compiler/chainrules.jl:206 [inlined]
 [3] Pullback
   @ ./REPL[162]:2 [inlined]
 [4] (::typeof(∂(my_test_function)))(Δ::Float64)
   @ Zygote ~/.julia/packages/Zygote/PD12J/src/compiler/interface2.jl:0
 [5] (::Zygote.var"#60#61"{typeof(∂(my_test_function))})(Δ::Float64)
   @ Zygote ~/.julia/packages/Zygote/PD12J/src/compiler/interface.jl:45
 [6] withgradient(f::Function, args::Matrix{Float64})
   @ Zygote ~/.julia/packages/Zygote/PD12J/src/compiler/interface.jl:124
 [7] top-level scope
   @ REPL[164]:1

Any idea what's going on? Or maybe it's a deeper Zygote issue?

How to do implicit differentiation for neural network parameters?

Hi!

I'm trying to do implicit differentiation of the parameters for a given neural network.

For example, nn is a neural network constructed by Flux.jl. I can get the parameters by Flux.params(nn).

In this tutorial, I need to provide the parameters as the arguments of the forward solver function (here, it's lasso(data::ComponentArray)).

But I don't know how to do this for my case; namely, optimization_prob(parameters) = ...?

If I can overwrite network parameters with p = Flux.params(nn), then I would be able to do so like

function optimization_prob(parameters)  # will be provided by `Flux.params(nn)` outside this function
    load_parameters!(nn, parameters)
    # etc...
end

Is there any workaround for this?

ReverseDiff support in an extension

It would be cool to make ReverseDiff work with this using the ReverseDiff.@grad macro. Examples: https://github.com/JuliaDiff/ReverseDiff.jl/blob/master/test/MacrosTests.jl#L184

Add forward chain rules to support Diffractor

Not yet crucial because Diffractor is only compatible with Julia 1.10

Behavior when the linear solver fails

At the moment we throw an error, can we do better? NaNs maybe?

Support BFGS-like Jacobian approximations

Inspired by https://arxiv.org/abs/1909.01377 (section 3.1.3)

QR decomposition example

Here is an example of differentiating QR decomposition using this package. Would be nice to have an example for each factorization.

using LinearAlgebra, ImplicitDifferentiation, ComponentArrays, Zygote

function qr_conditions(A, x)
  Q, R = x.Q, x.R
  return vcat(
    vec(UpperTriangular(Q' * Q) + LowerTriangular(R) - I - Diagonal(R)),
    vec(Q * R - A),
  )
end
function qr_forward(A)
  qr_res = qr(A)
  Q = copy(qr_res.Q[:, 1:size(A, 2)])
  R = qr_res.R
  return ComponentVector(; Q, R)
end

diff_qr = ImplicitFunction(qr_forward, qr_conditions)

A = rand(10, 4)
x = diff_qr(A)

J = Zygote.jacobian(diff_qr, A)[1]

Support ReverseDiff by default instead of opt in

return Bool indicating linear solver success instead of throwing errors

returning a flag (Bool) indicating that the linear solver failed would be more helpful. catching errors with try/catch is inefficient/costly

Jacobian byproduct assumed to be constant in x - incompatible with higher order AD

Currently the byproduct z is assumed to be a constant in x. In optimization, z may be used to return a Jacobian approximation for use in the conditions function. This is fine as long as we only need the gradient of the optimization solution y. If we need higher order derivatives, then we would need to propagate the derivatives back through the byproduct z which is currently not possible because we always define the cotangent wrt z to be NoTangent(). I am not sure if there is a way to handle this correctly except to document it perhaps.

Generic input types

At the moment, the library is designed with vector inputs x and outputs y in mind. We should be more generic than that, which may require some form of flattening mechanism

Reverse mode AD with forward mode backend for conditions (and vice versa)

I have a case where I need to autodiff through scalar functions and matrix operations. I understand forward mode is better for the former and reverse mode more efficient for the latter.

what might solve this "problem" is using reverse mode on the overall chain and forward mode as a backend on the conditions involving the scalar functions.

looking at the current implementation I would suggest to make backend an argument similar to return_byproduct

Add example use of byproduct

I just realized that using the byproduct in the conditions is fairly easy, but using the byproduct in the condition derivatives is harder, unless the user defines their own chain rule. What do you think?

Allow forward pass to return Jacobian

In some cases, the forward pass will compute a Jacobian along with the output. We should use it instead of duplicating the work in the conditions

Assert that the output from the forward function is a tuple of byproducts are allowed

I noticed that if the forward function returns an array, that the following deconstructing syntax still works assigning 1 to y, 2 to z and ignoring 3.

y, z = [1, 2, 3]

This can make incorrect code work and lead to incorrect behaviour if the user allows a byproduct but forgot to return it from the forward function. We should probably add an assertion that the output from the forward function is a tuple of length 2 if byproducts are allowed.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Seems CI on 1.6 fails because of this line

https://github.com/gdalle/ImplicitDifferentiation.jl/blob/c854b5d1601f46e4b3822f47ca60f75cf4edce7d/test/runtests.jl#L43

make additional information z optional

what about making z optional?

I see benefits in cleaner and more concise code when writing up forward and condition functions and possibly minor speed gains

Documentation inconsistency for v0.4.4

I was trying to implement my own implicit diff function for the minimization of log-sum-exp function, and I found that the documentation for the stable version is somewhat different from what I actually obtained by running a code.

For example, according to the doc of the stable version (https://gdalle.github.io/ImplicitDifferentiation.jl/stable/examples/4_constrained_optim/), I wrote this code:

using ImplicitDifferentiation
using Flux
using Convex, ECOS
using ComponentArrays


# x = rand(2) .+ [0, 1]
i_max = 10
m = 2
A = 2 * rand(i_max, m) .- 1
B = 2 * rand(i_max) .- 1
θ = ComponentArray(A=A, B=B)
u_min = -2*ones(m)
u_max = +2*ones(m)


# function mysqrt_cstr_optim(x)
function minimise_logsumexp(θ; T, u_min, u_max, initial_guess, solver=() -> ECOS.Optimizer())
    (; A, B) = θ
    m = size(A)[2]
    u = Convex.Variable(m)
    if initial_guess != nothing
        u.value = initial_guess
    end
    obj = T * Convex.logsumexp((1/T)*(A*u + B))
    prob = Convex.minimize(obj)
    prob.constraints += [u >= u_min]
    prob.constraints += [u <= u_max]
    solve!(prob, solver(), silent_solver=true, verbose=false)
    minimiser = typeof(u.value) <: Number ? [u.value] : u.value[:]  # to make it a vector
    return minimiser
end


function forward_cstr_optim(x; kwargs...)
    # y = mysqrt_cstr_optim(x)
    y = minimise_logsumexp(x; kwargs...)
    # z = 0
    return y
end


# function proj_hypercube(p)
#     return max.(0, min.(1, p))
# end
function proj_hypercube(p; u_min, u_max)
    return max.(u_min, min.(u_max, p))
end

# function conditions_cstr_optim(x, y)
function conditions_cstr_optim(θ, u; kwargs...)
    # ∇₂f = 2 .* (y .^ 2 .- x)
    (; A, B) = θ
    ∇₂f = A' * Flux.softmax(A*u+B)
    η = 0.1
    # return y .- proj_hypercube(y .- η .* ∇₂f)
    return u .- proj_hypercube(u .- η .* ∇₂f; kwargs...)
end


function implicit_cstr_optim_func(θ; T, u_min, u_max, initial_guess, solver)
    tmp = ImplicitFunction(
                           θ -> forward_cstr_optim(θ; T, u_min, u_max, initial_guess, solver),
                           (θ, u) -> conditions_cstr_optim(θ, u; u_min, u_max),
                          )
    tmp(θ)
end

implicit_cstr_optim_func(θ; T=1, u_min=-ones(m), u_max=+ones(m), initial_guess=nothing, solver=() -> ECOS.Optimizer())

In docs, when calling the implicit function, it is expected to give an array.
With v0.5.0-DEV, the result seems fine:

julia> implicit_cstr_optim_func(θ; T=1, u_min=-ones(m), u_max=+ones(m), initial_guess=nothing, solver=() -> ECOS.Optimizer())

2-element Vector{Float64}:
  0.11430458077043977
 -0.9999999846444508

However, with v0.4.4 (the stable version), it gives me a tuple like this:

julia> implicit_cstr_optim_func(θ; T=1, u_min=-ones(m), u_max=+ones(m), initial_guess=nothing, solver=() -> ECOS.Optimizer())
(0.710863877230062, -0.932445241056164)

Support several inputs instead of just `x`

At the moment, if derivatives wrt several inputs are needed, our recommendation is to use ComponentArrays.jl to concatenate them into a single vector. However, this is not very handy and can lead to performance issues: the naive keyword-based constructor for ComponentVector is type-unstable:

julia> using ComponentArrays, Test

julia> @inferred ComponentVector(a=1, b=[2, 3])
ERROR: return type ComponentVector{Int64, Vector{Int64}, Tuple{Axis{(a = 1, b = 2:3)}}} does not match inferred return type Any
Stacktrace:
   [1] error(s::String)
     @ Base ./error.jl:35

julia> f() = ComponentVector(a=1, b=[2, 3])
f (generic function with 1 method)

julia> @inferred f()
ERROR: return type ComponentVector{Int64, Vector{Int64}, Tuple{Axis{(a = 1, b = 2:3)}}} does not match inferred return type Any
Stacktrace:
   [1] error(s::String)
     @ Base ./error.jl:35

In fact, achieving type-stable construction would require knowing the array sizes statically (jonniedie/ComponentArrays.jl#126 (comment)), which is impossible in many cases.

An alternative would be to accept an arbitrary number of input arrays on our end. After brief experimentation, I fear this would introduce new sources of type instability as we loop through the args and splat them, storing results in vectors or tuples.

My proposal is to manually add support for two arguments. It will lead to code duplication but ensure type stability, and should cover a lot of use cases. Most notably, it will be necessary to implement implicit linear solvers, and thus enable second-order differentiation out of the box (#26).

Working with the tangential space of a manifold

Hi there, thanks for your amazing work!

I was wondering if I could use this approach to iteratively work out the tangential space of the manifold defined by f(x)=0, with x ∈ ℝⁿ, f ∈ℝᵈ. Right now I'm using the implicit function theorem to calculate the tangential vectors. However, with the naive implementation (using DiffOpt.jl) the accuracy is very low compared to the tangential vectors I get by sampling over points and taking x-x'/|x-x'| for x-x'->0. I do not know about the concrete details of your implementation, but I have the impression that it is superior to my naive Ansatz.

Further, do you think it could be possible to calculate the curvature tensor (and henceforth, the parallel transport) of the manifold at each point? In the end I'm interested in singularities (i.e. bifurcations) and for this I need to be able to calculate derivatives up to high precision.

Best,

Avoid linear operators in the DirectLinearSolver case

We could directly build a Jacobian

Fix type stability tests in reverse mode

At the moment, the JET tests in reverse mode are breaking due to runtime dispatch. This is presumably happening because Zygote-computed pullbacks introduce type instability.
The solution may be to write conditions with explicit rrules.

byproduct not returned

here is some code which works with v0.3.0 but is missing the byproducts and below is another implementation using v0.4.4 (same on master branch) trying and failing to get the byproducts. Any help to get them is much appreciated.

block_solver_AD(parameters_and_solved_vars::Vector{<: Real}, 
    n_block::Int, 
    ss_solve_blocks::Function, 
    guess::Vector{Float64}, 
    lbs::Vector{Float64}, 
    ubs::Vector{Float64};
    tol::AbstractFloat = eps(Float64),
    verbose::Bool = false) = ImplicitFunction(x -> block_solver(x,
                                                            n_block, 
                                                            ss_solve_blocks,
                                                            guess,
                                                            lbs,
                                                            ubs;
                                                            tol = tol,
                                                            verbose = verbose)[1],  
                                        (x,y) -> ss_solve_blocks(x,y))

function block_solver(parameters_and_solved_vars::Vector{Float64}, 
    n_block::Int, 
    ss_solve_blocks::Function, 
    guess::Vector{Float64}, 
    lbs::Vector{Float64}, 
    ubs::Vector{Float64};
    tol::AbstractFloat = eps(),
    verbose::Bool = false)

    #=
    ...
    =#

    return sol_values, sol_minimum
end


block_solver_RD = block_solver_AD([alp, bet, psi, del, m, ➕₂, ➕₃, ➕₄, ➕₅], 1, 𝓂.ss_solve_blocks[1], inits, lbs, ubs, verbose = true)

solution = block_solver_RD([alp, bet, psi, del, m, ➕₂, ➕₃, ➕₄, ➕₅])

when I try to get the byproduct using the below formulation and v0.4.4 it does not return the byproduct:

block_solver_AD(parameters_and_solved_vars::Vector{<: Real}, 
    n_block::Int, 
    ss_solve_blocks::Function, 
    guess::Vector{Float64}, 
    lbs::Vector{Float64}, 
    ubs::Vector{Float64};
    tol::AbstractFloat = eps(Float64),
    verbose::Bool = false) = ImplicitFunction(x -> block_solver(x,
                                                            n_block, 
                                                            ss_solve_blocks,
                                                            guess,
                                                            lbs,
                                                            ubs;
                                                            tol = tol,
                                                            verbose = verbose),  
                                        (x,y,z) -> ss_solve_blocks(x,y), Val(true))

sry for not providing a mwe. the whole code is a bit involved in parts unrelated to implicitDifferentiation. I hope the point comes across nonetheless.

Fix kwargs support

At the moment kwargs give rise to a strange type inference bug (see the discussion in #19), which is only fixed with a contorsion (here)

juliadecisionfocusedlearning / implicitdifferentiation.jl Goto Github PK

implicitdifferentiation.jl's Introduction

ImplicitDifferentiation.jl

Background

Getting started

Related projects

implicitdifferentiation.jl's People

Contributors

Stargazers

Watchers

Forkers

implicitdifferentiation.jl's Issues

Case 1: with ComponentArrays (not working)

Code

Result

Case 2: without ComponentArrays (working)

Code

Result

Recommend Projects

Recommend Topics

Recommend Org