Ultimately we want to use starry for inference, so having analytic derivatives of the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Currently computing gradients of the flux in <a href="https://github.com/rodluger/star

You definitely won't need to manually cast everything to <code class="notrans

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Dude! <g-emoji class="g-emoji" alias="tada" fallback-src="https://github.githubassets.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Get autodiff working,about rodluger/starry

Comments (24)

ericagol commented on August 26, 2024 2

Shouldn’t we be saying “Doctor” rather than “Dude”? That looks great! Eric Agol Astronomy Professor University of Washington

…

On May 3, 2018, at 10:45 AM, Dan Foreman-Mackey ***@***.***> wrote: Dude! 🎉

🎈

🍻 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

from starry.

ericagol commented on August 26, 2024 1

Attached are my Julia codes needed to run the AutoDiff version. From the Julia prompt, run: include("test_sn_jacobian.jl") which computes the ForwardDiff jacobian, and then computes it with BigFloat finite differences, and compare the results. The resulting jacobians are precise to ~10^{-15} or better for the r=0.1, b =0.95 example.

-Eric

Fantastic! @dfm and I are going to try to get that working in C++ on Friday. On Wed, Apr 25, 2018, 9:59 PM Eric Agol ***@***.***> wrote: > I got autodiff working on the s_n(r,b) components. I just finished coding > up the s_n function in Julia, and I implemented autodiff using the > ForwardDiff package. > > I still haven't gotten the transformation and rotation matrices computed, > but these should be straightforward. > > — > You are receiving this because you were assigned. > Reply to this email directly, view it on GitHub > <#2 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AI5FK0oQx6XifoQ-VTWVNWUV0C0WCAqoks5tsRwKgaJpZM4SSiiw> > . > — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAO30HJuO2OBaUynoB1ojFt5XoHNWQrOks5tsSDigaJpZM4SSiiw>.

function ellec_bulirsch(k2::Real) ca = sqrt(eps(k2)) if k2 == 1.0 return 1.0 end m=1.0 a=1.0 b=1.0-k2 kc=sqrt(b) c=a a+=b b=2.0*(c*kc+b) c=a m0=m m+=kc a+=b/m while abs(m0-kc) > (ca*m0) kc=2.0*sqrt(kc*m0) b=2.0*(c*kc+b) c=a m0=m m+=kc a+=b/m # println(a,' ',b,' ',c,' ',m0,' ',kc) end return pi*0.25*a/m end function ellk_bulirsch(k2::Real) ca = sqrt(eps(k2)) kc=sqrt(1.0-k2) h=1.0 m=1.0 while abs(h-kc) > (ca*h) h = m m += kc kc = sqrt(h*kc) m = 0.5*m # println(h,' ',m,' ',kc,' ',h-kc) end h=m m+=kc return pi/m end function ellpic_bulirsch(n::T,k2::T) where {T <: Real} # Computes the complete elliptical integral of the third kind using # the algorithm of Bulirsch (1965): kc = sqrt(1.0-k2) ca = sqrt(eps(n)) if (kc*(n+1.0)) == 0 ellpic = NaN else ee = kc m0 = 1.0 if n > -1.0 c = 1.0 p = sqrt(n+1.0) d = 1.0/p else g = -n f = -k2-n p = sqrt(f/g) d = -k2/(g*p) c = 0.0 end f = copy(c) c += d/p g = ee/p d = (f*g+d)*2.0 p += g g = m0 m0 += kc while abs(g-kc) > (ca*g) kc = 2.0*sqrt(ee) ee = kc*m0 f = copy(c) c += d/p g = ee/p d = (f*g+d)*2.0 p += g g = m0 m0 += kc end ellpic= 0.5*pi*(c*m0+d)/(m0*(m0+p)) end return ellpic end # Computes the M_{p,q} coefficients defined in (D37) for # the r0 >> 1 limit in which k2 << 1 and r0-1 < b < r0+1 # (this is a huge occultor). using PyPlot function eps3_of(k2::T,Kofk::T,Eofk::T) where {T <: Real} eps3 = Kofk-Eofk-0.5*k2*Kofk return eps3 end function eps3_series(k2::Real) # Series evaluation of eps3 for k2 < 1: @Assert(k2 < 1) tol = eps(k2) eps3 = zero(k2) # Compute n=1 term term = pi/32 eps3 += term n=2 while abs(term) > tol*abs(eps3) term *= (2*n-1)^2*k2/4/(n^2-1) eps3 += term n +=1 end return eps3*k2*k2 end function mpq_init(k2::Real) # Initialize the M_{p,q} computation: k4 = k2*k2 m001 = (8.-16.*k2)/3. # eps3 term m002 = 4k4/3 # K(m) term m021 = (8.-28k2-12k4)/15. m022 = (22.-6k2)*k4/15 m201 = (32.-52k2+12k4)/15. m202 = (-2.+6k2)*k4/15 m221 = (32.-76k2+36k4-24k4*k2)/105. m222 = (-2.+30k2-12k4)*k4/105. return m001,m002,m021,m022,m201,m202,m221,m222 end function mpq_of(k2::T,pmax::Int64,qmax::Int64,Kofk::T,Eofk::T,alt::Bool) where {T <: Real} mpq = zeros(typeof(k2),pmax+1,qmax+1) if k2 < 1 && alt mpq1 = zeros(typeof(k2),pmax+1,qmax+1) mpq2 = zeros(typeof(k2),pmax+1,qmax+1) m001 = 0.; m021=0.; m201=0.; m221=0. m002 = 0.; m022=0.; m202=0.; m222=0. eps3 = zero(typeof(k2)) # Alternate form: if k2 < 0.1 eps3 = eps3_series(k2) else eps3=eps3_of(k2,Kofk,Eofk) end m001,m002,m021,m022,m201,m202,m221,m222 = mpq_init(k2) mpq[1,1]=m001*eps3+m002*Kofk; mpq[1,3]=m021*eps3+m022*Kofk mpq[3,1]=m201*eps3+m202*Kofk; mpq[3,3]=m221*eps3+m222*Kofk # println("m221: ",m221," m222: ",m222," eps3: ",eps3," K(m): ",Kofk) else k4 = k2^2 if k2 < 1 eps1 = (1-k2)*Kofk; eps2 = Eofk elseif k2 > 1 # In this case, no special handling is required as there # is not cancellation as there is in k2 < 1 case. # However, for k2 >> 1, are the recursion relations stable? k2inv = 1.0/k2; k = sqrt(k2) eps1 = (1.0-k2)/k*Kofk; eps2 = k*Eofk+eps1 end mpq[1,1] = (8-12k2)*eps1/3+(-8+16*k2)*eps2/3 mpq[1,3] = (8-24k2)*eps1/15+(-8+28*k2+12*k4)*eps2/15 mpq[3,1] = (32-36k2)*eps1/15+(-32+52*k2-12k4)*eps2/15 mpq[3,3] = (32-60k2+12k4)*eps1/105+(-32+76k2-36k4+24k4*k2)*eps2/105 end #println("p,q: ",0," ",0," M_{pq}: ",mpq[1,1]) #println("p,q: ",0," ",2," M_{pq}: ",mpq[1,3]) #println("p,q: ",2," ",0," M_{pq}: ",mpq[3,1]) #println("p,q: ",2," ",2," M_{pq}: ",mpq[3,3]) # Now, use recursion relation to compute M_{p,q}: for p=0:2:pmax if p > 2 for q = 0:2:2 d3 = 2p+q-(p+q-2)*(1.-k2) d4 = (3-p)*k2 mpq[p+1,q+1] = (d3*mpq[p-1,q+1]+d4*mpq[p-3,q+1])/(p+q+3) # println("p,q: ",p," ",q," M_{pq}: ",mpq[p+1,q+1]) end end for q=4:2:qmax d1 = q+2+(p+q-2)*(1.-k2) d2 = (3-q)*(1.-k2) mpq[p+1,q+1]=(d1*mpq[p+1,q-1]+d2*mpq[p+1,q-3])/(p+q+3) # println("p,q: ",p," ",q," M_{pq}: ",mpq[p+1,q+1]) end end return mpq end # Computes the s_2 function. include("ellpic_bulirsch.jl") function s2(b::T,r::T,Kofk::T,Eofk::T) where {T <: Real} # For now, just compute linear component: Lambda1 = zero(typeof(b)) if b > 1.0+r || r == 0.0 # No occultation: Lambda1 = 0 # Case 1 elseif b < r-1.0 # Full occultation: Lambda1 = 0 # Case 11 else if b == 0 Lambda1 = -2/3*(1.0-r^2)^(3/2) # Case 10 elseif b==r if r == 0.5 Lambda1 = 1/3-4/(9pi) # Case 6 elseif r < 0.5 # Kofk = ellk_bulirsch(4r^2); Eofk = ellec_bulirsch(4r^2) # Lambda1 = 1/3+2/(9pi)*(4*(2r^2-1)*ellec_bulirsch(4r^2)+(1-4r^2)*ellk_bulirsch(4r^2)) # Case 5 Lambda1 = 1/3+2/(9pi)*(4*(2r^2-1)*Eofk+(1-4r^2)*Kofk) # Case 5 else # Kofk = ellk_bulirsch(inv(4r^2)); Eofk = ellec_bulirsch(inv(4r^2)) # Lambda1 = 1/3+16r/(9pi)*(2r^2-1)*ellec_bulirsch(inv(4r^2))-(1-4r^2)*(3-8r^2)/(9pi*r)*ellk_bulirsch(inv(4r^2)) # Case 7 Lambda1 = 1/3+16r/(9pi)*(2r^2-1)*Eofk-(1-4r^2)*(3-8r^2)/(9pi*r)*Kofk # Case 7 end else k2 = 1.0+(1.0-(b+r)^2)/(4*b*r) if (b+r) > 1.0 # k^2 < 1, Case 2, Case 8 # Kofk = ellk_bulirsch(k2) # Eofk = ellec_bulirsch(k2) Piofnk = ellpic_bulirsch(-k2*(b+r)^2,k2) xi = 2*b*r*(4-7r^2-b^2) Lambda1 = (((r+b)^2-1)/(r+b)*(-2r*(2*(r+b)^2+(r+b)*(r-b)-3)*Kofk+3*(b-r)*Piofnk) -2*xi*Eofk)/(9*pi*sqrt(b*r)) elseif (b+r) < 1.0 # k^2 > 1, Case 3, Case 9 k2inv = inv(k2) # Kofk = ellk_bulirsch(k2inv) # Eofk = ellec_bulirsch(k2inv) if abs(b-r) != 1.0 Piofnk = ellpic_bulirsch(-inv(k2*(b+r)^2),k2inv)/sqrt(1-(b-r)^2) else Piofnk = 0.0 end Lambda1 = 2*((1-(r+b)^2)*(sqrt(1-(b-r)^2)*Kofk+3*(b-r)/(b+r)*Piofnk) -sqrt(1-(b-r)^2)*(4-7r^2-b^2)*Eofk)/(9*pi) else # b+r = 1 or k^2=1, Case 4 (extending r up to 1) Lambda1 = 2/(3pi)*acos(1.-2*r)-4/(9pi)*(3+2r-8r^2)*sqrt(r*b)-2/3*convert(typeof(b),r>.5) end end end s2 = 1.0-1.5*Lambda1-convert(typeof(b),r>b) return s2*2pi/3 end # Computes the s_n terms from STARRY: include("mpq.jl") include("ellk_bulirsch.jl") include("ellec_bulirsch.jl") include("s2.jl") function s_n!(l_max::Int64,r::T,b::T,sn::Array{T,1}) where {T <: Real} @Assert(r > 0.0) # if r=0, then no occultation - can just use phase curve term. # Computes the s_n terms up to l_max # Find n_max: n_max = l_max^2+2*l_max # sn = zeros(typeof(r),n_max+1); Kofk = zero(typeof(b)); Eofk = zero(typeof(b)) # First, compute lambda and phi: if b == 0.0 if r >= 1.0 # full obscuration - return zeros return sn else # Annular eclipse - integrate around the full boundary of both bodies: lam = pi/2 phi = pi/2 end k2 = Inf # Elliptic integrals K(0) & E(0): Kofk = pi/2 Eofk = pi/2 else if b > abs(1.0-r) && b < 1.0+r lam = asin((1.0-r^2+b^2)/(2*b)) phi = asin((1.0-r^2-b^2)/(2*b*r)) else lam=pi/2; phi=pi/2 end # Next, compute k^2 = m: k2 = (1.0-(r-b)^2)/(4*b*r) # Compute elliptic integrals: if k2 < 1.0 Kofk = ellk_bulirsch(k2) Eofk = ellec_bulirsch(k2) elseif k2 > 1.0 # Need to compute inverse of k2: k2inv = 1.0/k2 Kofk = ellk_bulirsch(k2inv) Eofk = ellec_bulirsch(k2inv) else # For k2=1.0, K(k2) diverges, E(k2) is 1: Kofk = Inf Eofk = 1.0 end end #println("K(m): ",Kofk," E(m): ",Eofk) #println("phi: ",phi," lam: ",lam) # Second, pre-compute the I, H & J functions: # mu goes from 0 to 2*l_max, so u needs to go from 0 up to l_max+2. # nu goes up to 2*l_max, so v goes from 0 up to l_max. Iuv = zeros(typeof(r),l_max+3,l_max+1) Huv = zeros(typeof(r),l_max+3,l_max+1) clam = cos(lam); slam = sin(lam) clam2 = clam*clam; clamn = clam; slamn = slam cphi = cos(phi); sphi = sin(phi) cphi2 = cphi*cphi; cphin = cphi for u=0:2:l_max+2 if u == 0 Huv[1,1]= 2*lam+pi Huv[1,2]= -2*clam Iuv[1,1]= 2*phi+pi Iuv[1,2]= -2*cphi slamn = slam sphin = sphi v=2 while v <= l_max Huv[1,v+1]= (-2*clam*slamn+(v-1)*Huv[1,v-1])/(u+v) Iuv[1,v+1]= (-2*cphi*sphin+(v-1)*Iuv[1,v-1])/(u+v) slamn *= slam sphin *= sphi v+=1 end else slamn = slam sphin = sphi v = 0 while v <= l_max Huv[u+1,v+1]= (2*clamn*slamn+(u-1)*Huv[u-1,v+1])/(u+v) Iuv[u+1,v+1]= (2*cphin*sphin+(u-1)*Iuv[u-1,v+1])/(u+v) slamn *= slam sphin *= sphi v+=1 end clamn *= clam2 cphin *= cphi2 end end #println("Huv: ",Huv) #println("Iuv: ",Iuv) # Next, compute Juv: Juv = zeros(typeof(r),l_max+3,l_max+1) # p and q go up to u+2v = 3*l_max+2 if b == 0.0 Juv .= (1.0-r^2)^1.5*Iuv else mpq = mpq_of(k2,3*l_max+2,3*l_max+2,Kofk,Eofk,true) #println("M_pq: ",mpq) factor = 8*(b*r)^1.5 for u=0:2:l_max+2 for v=0:l_max, i=0:v Juv[u+1,v+1] += factor * binomial(v,i)*(-1)^(i-v-u)*mpq[u+2*i+1,u+2*(v-i)+1] end factor *= 4 end end #println("Juv: ",Juv) # Next, compute the K and L functions: Kuv = zeros(typeof(r),l_max+3,l_max+1) Luv = zeros(typeof(r),l_max+3,l_max+1) bonr = b/r for u=0:2:l_max+2 for v=0:l_max for i=0:v fac = binomial(v,i)*bonr^(v-i) Kuv[u+1,v+1] += fac*Iuv[u+1,i+1] Luv[u+1,v+1] += fac*Juv[u+1,i+1] end end end #println("Kuv: ",Kuv) #println("Luv: ",Luv) l = 0; n = 0; m = 0; pofgn = zero(typeof(r)); qofgn = zero(typeof(r)) while n <= n_max if n == 2 sn[n+1] = s2(b,r,Kofk,Eofk) # println("l: ",l," m: ",m," mu: ",l-m," nu: ",l+m," n: ",n," s_n: ",sn[n+1]) else mu = l-m; nu = l+m pofgn = zero(typeof(r)); qofgn = zero(typeof(r)) # Equation for P(G_n) and Q(G_n): if iseven(nu) i1 = convert(Int64,mu/2)+2+1; i2 = convert(Int64,nu/2)+1 pofgn = r^(l+2)*Kuv[i1,i2] qofgn = Huv[i1,i2] elseif iseven(l) && mu == 1 pofgn = -r^(l-1)*Juv[l-2+1,2] elseif mu == 1 pofgn = -r^(l-2)*(b*Juv[l-3+1,2]+r*Juv[l-3+1,3]) else pofgn = r^(l-1)*Luv[convert(Int64,(mu-1)/2)+1,convert(Int64,(nu-1)/2)+1] end sn[n+1] = qofgn-pofgn # if n == 7 || n == 10 || n == 14 # println("l: ",l," m: ",m," mu: ",mu," nu: ",nu," n: ",n," s_n: ",sn[n+1]," Q: ",qofgn," P: ",pofgn) # end end m +=1 if m > l l += 1 m = -l end n += 1 end # Return the vector of coefficients: #return sn return end # Tests automatic differentiation on sn.jl: include("sn.jl") using ForwardDiff using DiffResults function sn_jac(l_max::Int64,r::T,b::T) where {T <: Real} # Computes the derivative of s_n(r,b) with respect to r, b. # Create a vector for use with ForwardDiff x=[r,b] # Compute the length of the vector s_n: n_max = l_max^2+2*l_max # Allocate an array for s_n: sn = zeros(typeof(r),n_max+1) # Now, define a wrapper of s_n! for use with ForwardDiff: function diff_sn(x::Array{T,1}) where {T <: Real} # x should be a two-element vector with values [r,b] r,b = x sn = zeros(typeof(r),n_max+1) s_n!(l_max,r,b,sn) return sn end # Set up a type to store s_n and it's Jacobian with respect to x: out = DiffResults.JacobianResult(sn,x) # Compute the Jacobian (and value): out = ForwardDiff.jacobian!(out,diff_sn,x) # Place the value in the s_n vector: sn = DiffResults.value(out) # And, place the Jacobian in an array: sn_jacobian = DiffResults.jacobian(out) return sn,sn_jacobian end r = 0.1; b= 0.95 l_max = 3 sn,sn_jacobian= sn_jac(l_max,r,b) # Now, carry out finite-difference: dq = big(1e-15) n_max = l_max^2+2*l_max # Allocate an array for s_n: sn_big = zeros(BigFloat,n_max+1) # Make BigFloat versions of r & b: r_big = big(r); b_big = big(b) # Compute s_n to BigFloat precision: s_n!(l_max,r_big,b_big,sn_big) # Now, compute finite differences: sn_jac_big= zeros(BigFloat,n_max+1,2) sn_plus = copy(sn_big) s_n!(l_max,r_big+dq,b_big,sn_plus) sn_minus = copy(sn_big) s_n!(l_max,r_big-dq,b_big,sn_minus) sn_jac_big[:,1] = (sn_plus-sn_minus)*.5/dq s_n!(l_max,r_big,b_big+dq,sn_plus) s_n!(l_max,r_big,b_big-dq,sn_minus) sn_jac_big[:,2] = (sn_plus-sn_minus)*.5/dq #convert(Array{Float64,2},sn_jac_big) #sn_jacobian convert(Array{Float64,2},sn_jac_big)-sn_jacobian

from starry.

rodluger commented on August 26, 2024 1

@dfm Autodiff is working beautifully! Thanks for the help.

from starry.

rodluger commented on August 26, 2024 1

Currently computing gradients of the flux in this test file. Run make test_autodiff to compile it. I'm going to add a gradient option to the pybind interface so we can start using this!

from starry.

dfm commented on August 26, 2024 1

You definitely won't need to manually cast everything to T. The problem appears when you have inline operations on AutoDiffScalars. It is slightly annoying, but I think you'll be able to fix it pretty fast.

from starry.

rodluger commented on August 26, 2024 1

@dfm Dude:

from starry.

dfm commented on August 26, 2024 1

Dude! 🎉🎈🍻

from starry.

dfm commented on August 26, 2024 1

That does sound tedious! Let me know if there's anything that I can do to help out.

from starry.

rodluger commented on August 26, 2024 1

@dfm @ericagol Just need to write it up!

from starry.

ericagol commented on August 26, 2024

I got autodiff working on the s_n(r,b) components. I just finished coding up the s_n function in Julia, and I implemented autodiff using the ForwardDiff package. I haven't added analytic derivatives of the elliptic integrals: ForwardDiff diffs those as well.

I still haven't gotten the transformation and rotation matrices computed, but these should be straightforward.

from starry.

rodluger commented on August 26, 2024

Fantastic! @dfm and I are going to try to get that working in C++ on Friday.

…

On Wed, Apr 25, 2018, 9:59 PM Eric Agol ***@***.***> wrote: I got autodiff working on the s_n(r,b) components. I just finished coding up the s_n function in Julia, and I implemented autodiff using the ForwardDiff package. I still haven't gotten the transformation and rotation matrices computed, but these should be straightforward. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AI5FK0oQx6XifoQ-VTWVNWUV0C0WCAqoks5tsRwKgaJpZM4SSiiw> .

from starry.

rodluger commented on August 26, 2024

@ericagol Can you create a julia folder in the top level of the repository and place your code in there?

from starry.

ericagol commented on August 26, 2024

Yes. I just created a folder called julia/ at the top level of the repo where I placed this code and a preliminary README.md.

from starry.

rodluger commented on August 26, 2024

Great, thanks!

from starry.

rodluger commented on August 26, 2024

@dfm How do I tell Eigen to not compute derivatives for a given variable? Say I have the function

template <typename T>
T testfunction(T& x1, T& x2, T& x3) {
    return x1 + x2 * x2 + x3 * x3 * x3;
}

and I want to compute derivatives w/ respect to x1 and x2, but not x3. Because of the way I've templated this, all three must have the same type, so if x1 and x2 are AutoDiffScalars, so too must x3. A cryptic comment in this example,

/**
* Important note:
* All ActiveScalars which are used in one computation must have
* either a common derivative vector length or a zero-length
* derivative vector.
*/

led me to believe that I could just resize x3.derivatives() to size zero, but that leads to weird assertion errors. Any tips?

from starry.

rodluger commented on August 26, 2024

The following seems to work:

using Grad = Eigen::AutoDiffScalar<Eigen::VectorXd>;
Grad x1 = Grad(3., 2, 0);
Grad x2 = Grad(5., 2, 1);
Grad x3 = Grad(1., 2, N);

where N is any number greater than or equal to 2 (the length of the derivative vector). Running

Grad result = testfunction(x1, x2, x3);
cout << result.derivatives() << endl;

prints only the derivatives with respect to x1 and x2. But this reeks of a memory leak.

from starry.

rodluger commented on August 26, 2024

@dfm I'm pretty sure there's a bug in Eigen: it's related to the reason we had to typecast many of the scalars in function calls to step() and the elliptic integrals to get the code to compile with autodiff. Check out this open Eigen issue and the corresponding code. I'm guessing what happens is that there's an issue with the * operator when the AutoDiffScalar variable has no derivatives. Long story short, if I change my function to

template <typename T>
T testfunction(T& x1, T& x2, T& x3) {
    return T(x1) + T(x2 * x2) + T(x3 * x3 * x3);
}

then everything works as expected. Here's a MWE:

#include <iostream>
#include <iomanip>
#include <Eigen/Core>
#include <cmath>
#include <unsupported/Eigen/AutoDiff>
#include <vector>
using namespace std;
using Grad = Eigen::AutoDiffScalar<Eigen::VectorXd>;

// Dummy function
template <typename T>
T testfunction(T& x1, T& x2, T& x3, T& x4) {
    // This leads to an "Assertion failed" error:
    //return x1 + x2 * x2 + x3 * x3 * x3 * x3 + x4;

    // This compiles and runs fine:
    return T(x1) + T(x2 * x2) + T(x3 * x3 * x3) + T(x4 * x4 * x4 * x4);
}

// Instantiate a Grad type with or without derivatives
Grad new_grad(string name, double value, vector<string>& gradients, int& ngrad) {
    if(find(gradients.begin(), gradients.end(), name) != gradients.end()) {
        return Grad(value, gradients.size(), ngrad++);
    } else {
        return Grad(value);
    }
}

// Let's roll
int main() {

    // The user will supply this vector of parameter names
    // for which we will compute derivatives
    vector<string> gradients;
    gradients.push_back("x1");
    gradients.push_back("x2");

    // Declare our parameters: only the ones the user
    // wants will be differentiated!
    int ngrad = 0;
    Grad x1 = new_grad("x1", 4., gradients, ngrad);
    Grad x2 = new_grad("x2", 3., gradients, ngrad);
    Grad x3 = new_grad("x3", 2., gradients, ngrad);
    Grad x4 = new_grad("x4", 1., gradients, ngrad);

    // Compute the function
    Grad result = testfunction(x1, x2, x3, x4);

    // Print the flux and all the derivatives
    cout << result.value() << endl;
    cout << result.derivatives() << endl;

    return 0;
}

Curiously, if I declare the number of derivatives at compile time using Vector2d instead of VectorXd, then there is no issue. This is specifically a bug when the derivative size is dynamic. But that's the whole point: we want the user to choose which and how many derivatives to compute...

I'm going to keep digging -- I'd rather not add T() to everything in my code!

from starry.

rodluger commented on August 26, 2024

FYI: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1281#c1
Looks like we might just have to add T() to everything...

from starry.

rodluger commented on August 26, 2024

@dfm Templates to the rescue! I think I found the ideal solution, which requires no hacky casting. The issue with Eigen is that there's a bug somewhere for dynamically-allocated derivative vectors, so the workaround is to declare all the types you might need ahead of time:

using Grad1 = Eigen::AutoDiffScalar<Eigen::Matrix<double, 1, 1>>;
using Grad2 = Eigen::AutoDiffScalar<Eigen::Vector2d>;
using Grad3 = Eigen::AutoDiffScalar<Eigen::Vector3d>;
using Grad4 = Eigen::AutoDiffScalar<Eigen::Vector4d>;

Then, template the function that instantiates a GradX type:

// Instantiate a Grad type with or without derivatives
template <typename T>
T new_grad(string name, double value, vector<string>& gradients, int& ngrad) {
    if(find(gradients.begin(), gradients.end(), name) != gradients.end()) {
        return T(value, gradients.size(), ngrad++);
    } else {
        return T(value);
    }
}

and the function that does the actual allocating/computing:

// Compute the test function and its derivatives
template <typename T>
void compute(vector<string>& gradients){
    // Declare our parameters: only the ones the user
    // wants will be differentiated!
    int ngrad = 0;
    T x1 = new_grad<T>("x1", 4., gradients, ngrad);
    T x2 = new_grad<T>("x2", 3., gradients, ngrad);
    T x3 = new_grad<T>("x3", 2., gradients, ngrad);
    T x4 = new_grad<T>("x4", 1., gradients, ngrad);

    // Compute the function
    T result = testfunction(x1, x2, x3, x4);

    // Print the flux and all the derivatives
    cout << result.value() << endl;
    cout << result.derivatives() << endl;
}

Then, all you need is an if-then-else to capture all the cases:

// Let's roll
int main() {

    // The user will supply this vector of parameter names
    // for which we will compute derivatives
    vector<string> gradients;
    gradients.push_back("x1");
    gradients.push_back("x2");

    if (gradients.size() == 1) compute<Grad1>(gradients);
    else if (gradients.size() == 2) compute<Grad2>(gradients);
    else if (gradients.size() == 3) compute<Grad3>(gradients);
    else if (gradients.size() == 4) compute<Grad4>(gradients);

    return 0;
}

This compiles and works like a charm! I'm attaching the code for future reference.
test_autodiff.txt

from starry.

rodluger commented on August 26, 2024

@dfm This is how I'm currently structuring the code:

>>> import starry

>>> map1 = starry.Map()

>>> map1[1, 0] = 1

>>> map1.flux(axis=(0, 1, 0), theta=0.3, xo=0.1, yo=0.1, ro=0.1)

0.9626882655504516

>>> map2 = starry.grad.Map()

>>> map2[1, 0] = 1

>>> map2.flux(axis=(0, 1, 0), theta=0.3, xo=0.1, yo=0.1, ro=0.1)

array([[ 9.62688266e-01,  4.53620580e-04,  0.00000000e+00,
        -6.85580453e-05, -2.99401131e-01, -3.04715096e-03,
         1.48905485e-03, -2.97910667e-01]])

The modules starry and starry.grad are compiled from the same chunk of code, but with a healthy sprinkling of #ifdef STARRY_AUTODIFF to handle AutoDiffScalar-specific implementation stuff. They therefore have all the same classes, methods, properties, and docstrings, but their outputs are of course different. To get this to work, I'm #includeing that chunk of code twice, once with STARRY_AUTODIFF undefined, and once with it defined.

What do you think of this? It certainly hasn't helped my code legibility...

PS: I haven't finished implementing this, but starry.grad.Map().flux() is working on the master branch.

from starry.

rodluger commented on August 26, 2024

Quick update on this. I'm slowly getting things to work with dynamically-sized derivative vectors, which is the ideal way to do this. The most important thing I've learned is that casting to type T is not enough, since type T has an Eigen::Dynamic vector length. I need to actually force the derivative vector of all intermediate variables -- and all function outputs -- to have the correct length. For instance, the following line in flux() doesn't work:

if (b <= ro - 1) return 0;

nor does

if (b <= ro - 1) return T(0);

since neither allocates space for the derivatives of the result. What I have to do is this:

if (b <= ro - 1) return 0 * ro;

where ro is one of the variables I'm differentiating.

For some reason I no longer get any compiler errors -- just segfaults when I finally run the code. Debugging this is therefore super tedious. But I'm getting the hang of it.

from starry.

rodluger commented on August 26, 2024

Got the flux calculation to work all the way through! Now the user can dynamically choose which and how many derivatives to compute. Gonna take a while for me to clean the code up and push to the master branch, but I think it's downhill from here!

from starry.

rodluger commented on August 26, 2024

Quick update on this: I'm switching back to compile-time defined derivative vector sizes. AutoDiffScalar<VectorXd> is riddled with issues and it's 20-30 times slower than the same calculation with fixed-size derivative vectors. It's actually more efficient to compute all derivatives and let the user choose which ones to output than to selectively compute only a few derivatives.

from starry.

rodluger commented on August 26, 2024

Closing this issue. There are things that can still be optimized, but I'm happy!

from starry.

Get autodiff working about starry HOT 24 CLOSED

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent