Giter Site home page Giter Site logo

Comments (54)

KristofferC avatar KristofferC commented on May 22, 2024

Isn't this handled by closures already?

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

True but in iterative algorithms where the second argument changes, the closure has to be redefined in each step/iteration, and this is not always the best design for extraneous reasons.

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

The closure "updates" together with the captured variables. There is no need to redefine it.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

That's good to know. At the same time it is good to allow for extra dummy input arguments (dummy from the autodiff perspective) as it would enable more diverse development. For example I have separated parameters (methods) from states (data), so my software engineering approach is conceptually orthogonal to closures.

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

I still don't understand why closures doesn't solve this.

If you have a function f(x, data, parameters, dummy, whatever) and want to take the derivative of f w.r.t x you just create f(x) = f(x, data, parameters, dummy, whatever) where the arguments now are actual instantiated variables and pass f to ForwardDiff. This is how all the optimization and solver packages does it. Your method is similar to what you do in c where you pass a void pointer as the second argument which you can cast to whatever you want but closures solves this imo.

from forwarddiff.jl.

gragusa avatar gragusa commented on May 22, 2024

The issue of allowing additional arguments to function was discussed in #32. It was also concluded that closures do indeed are a solution. A potential worry is that creating many closure --- for instance in MCMC case you might to create a closure for each iteration --- may end up with a big performance penalty. I have done some benchmarking and it seems that creating closures (many closures) is not a big issue as the penalty is an order of magnitude smaller than the cost of obtaining the function being differentiated.

from forwarddiff.jl.

jrevels avatar jrevels commented on May 22, 2024

Closing as duplicate of #32.

@scidom Feel free to reopen if you think this needs to be revisited, but I really think closures are the best way to do this. If it helps to approach it a different way, remember that ForwardDiff now supports differentiating callable types.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

Thanks @jrevels. I wanted to kindly ask if it would be possible to add this as a feature request. I understand that closures is a possible solution, but I want as a programmer and ForwardDiff user to have the alternative option of inputing a function with more than one arguments for two reasons, to avoid any small performance penalty (however small it may be, no penalty is better than small penalty), and most importantly because it may be helpful to provide an alternative usage case/programming paradigm.

If of course the answer is that it is not possible, then it is ok, but wanted to make the effort and ask for this feature.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

I tried the following:

a = rand(2000)
x = rand(2000)

n = 1000000

function f(x)
  x+a
end

function g(x, y)
  x+y
end

@time for i in 1:n
  f(x)
end

@time for i in 1:n
  g(x, a)
end

It seems that actually f is a tiny bit faster than g, i.e. closures are a tiny bit faster, is this possible?

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

Now you are using a global variable. That is not the same as creating an closure inside another function that captures function local variables.

from forwarddiff.jl.

jrevels avatar jrevels commented on May 22, 2024

Consider what it would be like if we DID have this feature - I think the package would actually be worse off. Here are some reasons why:

  • It'd make the API more complicated, and the API is already somewhat complicated.
  • We'd have to write (and maintain) tests that cover all the new behaviors enabled by this feature.
  • A general, maintainable implementation of this feature could very well be slower than just letting the user pass in closures and callable types, and could make our performance model even harder to understand.

from forwarddiff.jl.

gragusa avatar gragusa commented on May 22, 2024

Try this

n = 1000000
a = rand(n)
x = rand(n)

function g(x, y)
  x+y
end

@time for i in 1:n
  g(x[i], a[i])
end

@time for i in 1:n
  cl(x) = g(x, a[i])
  cl(x[i])
end

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

You will not get any useful data by benchmarking functions accessing global variables.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

Thanks for the feedback @KristofferC and for the snippet @gragusa.

@jrevels I can see your point about complexity of development. My primary criterion during development tends to be performance. Along these lines, to quantify performance, shall we try to think of a simple benchmark, without using globals as @KristofferC pointed, just to see what is the penalty of closures if any? It is hard for all of us to know the pros and cons without measuring them... I will try to think of a reasonable example, not necessarily related to ForwardDiff.

from forwarddiff.jl.

gragusa avatar gragusa commented on May 22, 2024

You can always enclose the above into a function. But why accessing global variables should be different for closures?

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

@gragusa you mean sth like the example below?

n = 1000000
a = rand(n)
x = rand(n)

function g(x, y)
  x+y
end

function z()
  for i in 1:n
    g(x[i], a[i])
  end
end

function w()
  for i in 1:n
    cl(x) = g(x, a[i])
    cl(x[i])
  end
end

@time z()

@time w()

This still uses globals, plus we redefine the closure in each iteration, so we may want to improve on these.

from forwarddiff.jl.

gragusa avatar gragusa commented on May 22, 2024

@scidom I was thinking more along the lines of

function f()
    n = 1000000
    a = rand(n)
    x = rand(n)

    function g(x, y)
        x+y
    end

    @time for i in 1:n
        cl(x) = g(x, a[i])
        cl(x[i])
    end

end

This is the worst case scenario --- when closures need to be created at each iteration. Think about MCMC within Gibbs: at each iteration you have to create a closure enclosing the blocks of variables.

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

Global variables can change type at any time and must therefore be boxed. Local variables can be reasoned about. This does not change when using closures.

Regarding performance. We are already passing in a function as argument to ForwardDiff which has its own overhead so the closure overhead should be amortized. For best performance, create a functor (type which overloads Base.call) and pass that to ForwardDiff.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

Yes, this Gibbs scenario is exactly the situation I am dealing with 👍

from forwarddiff.jl.

gragusa avatar gragusa commented on May 22, 2024

@KristofferC Yes, I understand the boxing. The fact that "this does not change when using closures" is what I thought --- and so the benchmarking in this case is not misleading because both functions deal with global in the same way.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

@gragusa I tried

function f()
    n = 1000000
    a = rand(n)
    x = rand(n)

    function g(x, y)
        x+y
    end

    @time for i in 1:n
        cl(x) = g(x, a[i])
        cl(x[i])
    end

    @time for i in 1:n
      g(x[i], a[i])
    end
end

f()

and there is massive gap in performance. Performance with Gibbs sampling is even worse than our toy benchmark because each Gibbs iteration requires redefining several closures - pretty much as many as the number of sampled parameters.

@KristofferC can you provide a toy example of what you meant about overloading Base.call()? It doesn't have to be a benchmark, just an example of what you have in mind about functors.

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

See https://github.com/JuliaLang/julia/blob/master/base/functors.jl

This types could also hold states. When passing these as arguments julia can do inlining and all that jazz.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

Thanks, I also found the following in the documentation:

http://docs.julialang.org/en/release-0.4/manual/methods/#call-overloading-and-function-like-objects

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

Also what is interesting is not the cost of the closure vs calling a normal function but the cost of passing a closure vs a function + arguments to ForwardDiff. My gut says there will be very little difference. I'm in bed now so can't test it.

from forwarddiff.jl.

jrevels avatar jrevels commented on May 22, 2024

Here's a concrete example of making a callable type and using it with ForwardDiff:

julia> immutable Foo{T}
           x::T
           y::T
       end

julia> call(f::Foo, a) = a^(f.x) + (a / f.y)
call (generic function with 1033 methods)

julia> f = Foo(3, 2)
Foo{Int64}(3,2)

julia> f(2.5)
16.875

julia> using ForwardDiff

julia> ForwardDiff.derivative(f, 2.5)
19.25

As @KristofferC said, you can use these types instead of closures. Any time where you would make a new closure to update the values, you can instead make a new instance of one of these types. If your type is mutable, you could just update the fields as you go instead of making new instances (though keeping it immutable could work better if you're storing isbits field types, like numeric primitives).

AFAIK, the plan is for closures to start using this callable type strategy "under the hood" in v0.5 (see JuliaLang/julia#13412).

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

That's amazing, I must say. I am feeling a bit puzzled though in two respects; i) in your example f is of type Foo and not of type Function, right? How is it possible then that Forward.derivative(f, 2.5) perceives f as a generic function instead of Foo?! ii) Jeff mentioned in the issue you forwarded that the call function will be deprecated; in this case, would your nice example remain applicable? Bed time for me too, but will see your reply tomorrow.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

Actually, I found the answer to the second question, I think call will be replaced by (f::Foo)(a) = a^(f.x) + (a / f.y) in our example...

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

I am trying to understand how this would work in practice, if I would need to introduce a new log-target type and make it callable, too tired to think clearly now, tomorrow again...

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

I wanted to add 2 more thoughts. Firstly, the functors are a great idea but (as any tool) they are not the proper means for every situation. More specifically, consider the case of having a functor f with field f.y for holding the field that would otherwise be enclosed in a closure, and let's say we call f(x), where f in its body uses f.y. This is all fine. But once we consider a second functor g with the same field g.y and call g(x) this becomes problematic if we would want f.y and g.y to point to a common vector (because f and g have their own copies of y).

The second remark relates to the comment made by @KristofferC; I wrote a simple example for which in one case we pass a closure f whereas in another case we pass a function g plus an argument a. If the closure doesn't need to be redefined, then there seems to be no difference in performance:

function mytest()
  len = 2000

  a = rand(len)
  x = rand(len)

  n = 1000000

  function f(x)
    x+a
  end

  function g(x, y)
    x+y
  end

  function fwrapper(f, x)
    f(x)
  end

  function gwrapper(g, x, y)
    g(x, y)
  end

  @time for i in 1:n
    fwrapper(f, x)
  end

  @time for i in 1:n
    gwrapper(g, x, a)
  end
end

mytest()

The only problem of course is development-related, i.e. having one version of user-defined function g(x, a) and then another internal version of it calling the relevant closure h(x). If this needs to be done for many functions, I need to maintain double copies for each function to communicate with the user, one external g(x, a) that doesn't enclose a (because as far as the API goes I do need to have a version of g that is not a closure so that l can create the closure later on, once a has been defined) and an internal that provides the respective closure. This is my glue code solution to proceed, with some extra memory penalty of course... but that's the only way I can reconcile the restriction posed by only allowing closures with ForwardDiff in the context of a general Gibbs scheme.

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

There is no reason why you couldn't make two separate instances of a functor hold references to the same vector.

Regarding recreating the closure, why not do something like this:

function f()
    n = 1000000
    a = rand(n)
    x = rand(n)


    function g(x, y)
        x+y
    end

    @time for i in 1:n
        cl(x) = g(x, a[i])
        cl(x[i])
    end

    a_val = 0.0
    cl2(x) = g(x, a_val)
    @time for i in 1:n
        a_val = a[i]
        cl2(x[i])
    end

    @time for i in 1:n
      g(x[i], a[i])
    end
end

Since the captured variable updates there is no need to recreate the closure unless you will change the type of captured variable in every iteration (seems unlikely).

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

I agree @KristofferC, there is no need to redefine the closure. The only difficulty is that I want to be able to define method fields in a parameter type independently of the enclosed value, which means that I need to delay the (one and only) definition of the closure until I know the delayed value.

About functors, you are actually right. For example the following works (therefore I will consider more seriously the possibility of using functors):

immutable Foo1{T}
  y::Vector{T}
end

immutable Foo2{T}
  y::Vector{T}
end

y = [2., 3.]

f1 = Foo1(y)
f2 = Foo2(y)

call(f::Foo1, a) = a*f.y
call(f::Foo2, a) = a+f.y

f1(2.)
f2(2.)

y[2] = 30.

f1(2.)
f2(2.)

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

P.S. I find the functor solution quite ingenious the more I think about it.

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

Note that in the (not so far away) future, the plan for julia is for closures to basically lower into their functor equivalent version making them just as performant.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

This is great. I find functors more handy than closures in Lora because they are more low level exactly, which makes dev easier.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

After more thinking and an attempt to implement the idea, it turns out functors won't work unfortunately for the application at hand. The problem is that the definition call(f::Foo, x) is Foo-specific and not f-specific, meaning that the overloading of call is universal for Foo, whereas I need different calls for different instances f of Foo. It seems after all that closures are the only way out probably.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

After having spent a good deal of time thinking over this problem and after having tried to tackle it in practice, I would like to share my thoughts and to make a case for adding the extra feature of a wrt optional argument.

I am convinced by now that there is no performance penalty when using closures, so performance is not a concern anymore. The reason I think the wrt feature is important is programming-related only. Alternative solutions have been proposed but none of them truly resolves the development scenario I am dealing with.

Functors are indeed a great solution but the main issue, as explained above, is that the definition of call() is made on the basis of the underlying functor, whereas in my case conceptually the call() function varies between instances of the functor.

Closures are great, but in a larger scale project of a "real-life" application they can become impractical. For instance, if we were to ask the user to pass just one closure to gradient() this would make sense. However, in many cases this is impractical because i) many closures may be involved and ii) most importantly the additional argument enclosed by the closure is not naturally known to the user, or at least not early in model specification, so the API for a model would become obfuscated and not transparent by asking the user to create a dummy variable with a specific name (so that it can later by modified before calling the closure).

The bottom line is that there is no performance consideration, but there are important use cases where it becomes less natural and more programmatically impractical to request the user to define a closure.

Along these lines, I want to kindly re-open issue #32 and consider adding the wrt option. This will not make ForwardDiff more complicated for the user, because wrt can have a default value; it will heve some impact only on maintenance of ForwardDiff but I think it is worth the effort in order to provide this extra feature that can prove to be invaluable in practice when using ForwardDiff inside other prackages.

from forwarddiff.jl.

mlubin avatar mlubin commented on May 22, 2024

the call() function varies between instances of the functor

@scidom, could you explain what you mean by this?

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

I will try to explain using a fictitious toy example:

immutable LogPrior
  y::Vector
end

call(lp::LogPrior, x::Vector{Float64}) = x+lp.y

y = [1., 2.5]
logprior = LogPrior(y)

logprior([2., 5.])

This works fine, but the problem is that I may want to define more than one priors. Since multiple dispatch for call(::LogPrior, ::Vector{Float64}) for the functor LogPrior can be exploited only once, I can not define, say

call(logprior::LogPrior, x::Vector{Float64}) = x-0.5y

for another prior...

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

I could code-generate many functors for different priors with @gensym LogPrior names but not sure if this is a reasonable coding approach.

from forwarddiff.jl.

mlubin avatar mlubin commented on May 22, 2024

What would the wrt form look like?

from forwarddiff.jl.

mlubin avatar mlubin commented on May 22, 2024

I'm not sure how the issue is any different in that case

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

I was thinking that if we have a function say f(x, y) and we can call

ForwardDiff.gradient(f, wrt=:x)

or perhaps

ForwardDiff.gradient(f, wrt=1)

then this may allow passing a function f with more than one arguments. I am not sure if this is the best solution, just trying to think of a way of maintaining a reasonable API and interface between ForwardDiff and Lora.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

To explain my problem, if I ask the user to provide a closure f(x) with a mangled name _y inside the body of the closure for the enclosed value, and also define a dummy _y = sth wouldn't that be ugly on the Lora end?

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

The other solution is that I ask the user to provide f(x, y) and I codegen the closure f(x) internally which is possible.

from forwarddiff.jl.

mlubin avatar mlubin commented on May 22, 2024

The functor object can just store the user-provided function and the arguments.

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

All the solver packages, and optimization packages seems to do just fine with only closures so I am sure there should be some way to solve this.

Regarding extra arguments in ForwardDiff. My first thought is that allowing 1 extra argument is completely arbitrary. What if you want to pass multiple extra arguments? The only way you can do that in a type stable way is to write a tailor made type for your specific function. This means that you suddenly impose on the user to always write their functions as f(x, extra_parameters), instead of letting them write them however they want and just ask them to pass the closure to the library. They also have to deal with the extra boilerplate of defining a new type for each function.

The functor object can just store the user-provided function and the arguments.

Isn't that pretty much exactly what a closure is?

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

@KristofferC, I agree, one would need to generalize the way wrt would operate, the above was just a first naive example.

@mlubin your solution works for me, it is really great, thank you:

immutable LogPrior
  f::Function
  y::Vector
end

call(lp::LogPrior, x::Vector{Float64}) = lp.f(x, lp.y)

y = [1., 2.5]
logprior = LogPrior((x, y) -> x+y, y)

logprior([2., 5.])

I will then close #32 and will use the proposed solution (and yes, @KristofferC, I presume this functor definition is pretty much conceptually equivalent to a closure).

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

One more thing, the reason this lower-level functor approach works better for me than just a closure is because I can then define the constructor (using pseudocode below)

LogPrior(f::Function, n::Int) = LogPrior(f, Array(MyLoraType, n))

so the user doesn't need to define the dummy variable y anymore, nor I need to worry about reserving a name for it.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 22, 2024

I may be doing sth wrong (highly likely due to being late), but I get an error when I try to use ForwardDiff with the above solution:

import ForwardDiff

immutable LogPrior
  f::Function
  y::Vector
end

call(lp::LogPrior, x::Vector) = lp.f(x, lp.y)

function f(x, y)
  x+y
end
y = [1., 2.5]
logprior = LogPrior(f, y)

logprior([2., 5.])

g = ForwardDiff.gradient(logprior)

g([2., 5.])

Although g is defined via ForwardDiff without issues, calling it throws

ERROR: MethodError: `convert` has no method matching convert(::Type{ForwardDiff.GradientNumber{2,Float64,Tuple{Float64,Float64}}}, ::Array{ForwardDiff.GradientNumber{2,Real,Tuple{Real,Real}},1})
This may have arisen from a call to the constructor ForwardDiff.GradientNumber{2,Float64,Tuple{Float64,Float64}}(...),
since type constructors fall back to convert methods.
Closest candidates are:
  ForwardDiff.GradientNumber{N,T,C}(::Any, ::Any)
  call{T}(::Type{T}, ::Any)
  convert{T<:Real}(::Type{T<:Real}, ::Complex{T<:Real})
  ...
 in _calc_gradient at /Users/theodore/.julia/v0.4/ForwardDiff/src/api/gradient.jl:86
 in g at /Users/theodore/.julia/v0.4/ForwardDiff/src/api/gradient.jl:54

from forwarddiff.jl.

misun6312 avatar misun6312 commented on May 22, 2024

So with current version how can I use additional arguments to function being differentiated?

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

Same as before. Pass a closure as your function to be differentiated where it closes over your parameters.

from forwarddiff.jl.

misun6312 avatar misun6312 commented on May 22, 2024

With Call overloading?

Okay I tried like this.. it's working.
I will apply this to my code. Thanks!

immutable LogLike
  f::Function
  y::Vector
end

call(ll::LogLike, params1::Vector) = ll.f(params1, ll.y)

function f(params1, y)
    dot(params1, y).^2+1
end
y = [1., 3, 10.]
LL = LogLike(f, y)

param = [2., 5., 1.]

LL(param)

res = GradientResult(param)
g = ForwardDiff.gradient!(res,LL,param)

ForwardDiff.gradient(res)

from forwarddiff.jl.

KristofferC avatar KristofferC commented on May 22, 2024

Call overloading is not needed anymore for performance. It is easier and more flexible to create a closure. Read through this thread and you will find examples. I am on mobile so can't post a full code example.

from forwarddiff.jl.

misun6312 avatar misun6312 commented on May 22, 2024

Oh.. you mean like this?

import ForwardDiff
using ForwardDiff


param = [2., 5., 1.]
    y = [1., 3, 10.]



function test(param, y)
    function g(params1)
        dot(params1, y).^2+1
    end
    LL = g(param)

    res = GradientResult(param)

    ForwardDiff.gradient!(res,g,param)
    ForwardDiff.gradient(res)    
end

test(param,y)

from forwarddiff.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.