Giter Site home page Giter Site logo

Get autodiff working about starry HOT 24 CLOSED

rodluger avatar rodluger commented on August 26, 2024
Get autodiff working

from starry.

Comments (24)

ericagol avatar ericagol commented on August 26, 2024 2

from starry.

ericagol avatar ericagol commented on August 26, 2024 1

from starry.

rodluger avatar rodluger commented on August 26, 2024 1

@dfm Autodiff is working beautifully! Thanks for the help.

from starry.

rodluger avatar rodluger commented on August 26, 2024 1

Currently computing gradients of the flux in this test file. Run make test_autodiff to compile it. I'm going to add a gradient option to the pybind interface so we can start using this!

from starry.

dfm avatar dfm commented on August 26, 2024 1

You definitely won't need to manually cast everything to T. The problem appears when you have inline operations on AutoDiffScalars. It is slightly annoying, but I think you'll be able to fix it pretty fast.

from starry.

rodluger avatar rodluger commented on August 26, 2024 1

@dfm Dude:
image

from starry.

dfm avatar dfm commented on August 26, 2024 1

Dude! 🎉🎈🍻

from starry.

dfm avatar dfm commented on August 26, 2024 1

That does sound tedious! Let me know if there's anything that I can do to help out.

from starry.

rodluger avatar rodluger commented on August 26, 2024 1

@dfm @ericagol Just need to write it up!
image

from starry.

ericagol avatar ericagol commented on August 26, 2024

I got autodiff working on the s_n(r,b) components. I just finished coding up the s_n function in Julia, and I implemented autodiff using the ForwardDiff package. I haven't added analytic derivatives of the elliptic integrals: ForwardDiff diffs those as well.

I still haven't gotten the transformation and rotation matrices computed, but these should be straightforward.

from starry.

rodluger avatar rodluger commented on August 26, 2024

from starry.

rodluger avatar rodluger commented on August 26, 2024

@ericagol Can you create a julia folder in the top level of the repository and place your code in there?

from starry.

ericagol avatar ericagol commented on August 26, 2024

Yes. I just created a folder called julia/ at the top level of the repo where I placed this code and a preliminary README.md.

from starry.

rodluger avatar rodluger commented on August 26, 2024

Great, thanks!

from starry.

rodluger avatar rodluger commented on August 26, 2024

@dfm How do I tell Eigen to not compute derivatives for a given variable? Say I have the function

template <typename T>
T testfunction(T& x1, T& x2, T& x3) {
    return x1 + x2 * x2 + x3 * x3 * x3;
}

and I want to compute derivatives w/ respect to x1 and x2, but not x3. Because of the way I've templated this, all three must have the same type, so if x1 and x2 are AutoDiffScalars, so too must x3. A cryptic comment in this example,

/**
* Important note:
* All ActiveScalars which are used in one computation must have
* either a common derivative vector length or a zero-length
* derivative vector.
*/

led me to believe that I could just resize x3.derivatives() to size zero, but that leads to weird assertion errors. Any tips?

from starry.

rodluger avatar rodluger commented on August 26, 2024

The following seems to work:

using Grad = Eigen::AutoDiffScalar<Eigen::VectorXd>;
Grad x1 = Grad(3., 2, 0);
Grad x2 = Grad(5., 2, 1);
Grad x3 = Grad(1., 2, N);

where N is any number greater than or equal to 2 (the length of the derivative vector). Running

Grad result = testfunction(x1, x2, x3);
cout << result.derivatives() << endl;

prints only the derivatives with respect to x1 and x2. But this reeks of a memory leak.

from starry.

rodluger avatar rodluger commented on August 26, 2024

@dfm I'm pretty sure there's a bug in Eigen: it's related to the reason we had to typecast many of the scalars in function calls to step() and the elliptic integrals to get the code to compile with autodiff. Check out this open Eigen issue and the corresponding code. I'm guessing what happens is that there's an issue with the * operator when the AutoDiffScalar variable has no derivatives. Long story short, if I change my function to

template <typename T>
T testfunction(T& x1, T& x2, T& x3) {
    return T(x1) + T(x2 * x2) + T(x3 * x3 * x3);
}

then everything works as expected. Here's a MWE:

#include <iostream>
#include <iomanip>
#include <Eigen/Core>
#include <cmath>
#include <unsupported/Eigen/AutoDiff>
#include <vector>
using namespace std;
using Grad = Eigen::AutoDiffScalar<Eigen::VectorXd>;

// Dummy function
template <typename T>
T testfunction(T& x1, T& x2, T& x3, T& x4) {
    // This leads to an "Assertion failed" error:
    //return x1 + x2 * x2 + x3 * x3 * x3 * x3 + x4;

    // This compiles and runs fine:
    return T(x1) + T(x2 * x2) + T(x3 * x3 * x3) + T(x4 * x4 * x4 * x4);
}

// Instantiate a Grad type with or without derivatives
Grad new_grad(string name, double value, vector<string>& gradients, int& ngrad) {
    if(find(gradients.begin(), gradients.end(), name) != gradients.end()) {
        return Grad(value, gradients.size(), ngrad++);
    } else {
        return Grad(value);
    }
}

// Let's roll
int main() {

    // The user will supply this vector of parameter names
    // for which we will compute derivatives
    vector<string> gradients;
    gradients.push_back("x1");
    gradients.push_back("x2");

    // Declare our parameters: only the ones the user
    // wants will be differentiated!
    int ngrad = 0;
    Grad x1 = new_grad("x1", 4., gradients, ngrad);
    Grad x2 = new_grad("x2", 3., gradients, ngrad);
    Grad x3 = new_grad("x3", 2., gradients, ngrad);
    Grad x4 = new_grad("x4", 1., gradients, ngrad);

    // Compute the function
    Grad result = testfunction(x1, x2, x3, x4);

    // Print the flux and all the derivatives
    cout << result.value() << endl;
    cout << result.derivatives() << endl;

    return 0;
}

Curiously, if I declare the number of derivatives at compile time using Vector2d instead of VectorXd, then there is no issue. This is specifically a bug when the derivative size is dynamic. But that's the whole point: we want the user to choose which and how many derivatives to compute...

I'm going to keep digging -- I'd rather not add T() to everything in my code!

from starry.

rodluger avatar rodluger commented on August 26, 2024

FYI: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1281#c1
Looks like we might just have to add T() to everything...

from starry.

rodluger avatar rodluger commented on August 26, 2024

@dfm Templates to the rescue! I think I found the ideal solution, which requires no hacky casting. The issue with Eigen is that there's a bug somewhere for dynamically-allocated derivative vectors, so the workaround is to declare all the types you might need ahead of time:

using Grad1 = Eigen::AutoDiffScalar<Eigen::Matrix<double, 1, 1>>;
using Grad2 = Eigen::AutoDiffScalar<Eigen::Vector2d>;
using Grad3 = Eigen::AutoDiffScalar<Eigen::Vector3d>;
using Grad4 = Eigen::AutoDiffScalar<Eigen::Vector4d>;

Then, template the function that instantiates a GradX type:

// Instantiate a Grad type with or without derivatives
template <typename T>
T new_grad(string name, double value, vector<string>& gradients, int& ngrad) {
    if(find(gradients.begin(), gradients.end(), name) != gradients.end()) {
        return T(value, gradients.size(), ngrad++);
    } else {
        return T(value);
    }
}

and the function that does the actual allocating/computing:

// Compute the test function and its derivatives
template <typename T>
void compute(vector<string>& gradients){
    // Declare our parameters: only the ones the user
    // wants will be differentiated!
    int ngrad = 0;
    T x1 = new_grad<T>("x1", 4., gradients, ngrad);
    T x2 = new_grad<T>("x2", 3., gradients, ngrad);
    T x3 = new_grad<T>("x3", 2., gradients, ngrad);
    T x4 = new_grad<T>("x4", 1., gradients, ngrad);

    // Compute the function
    T result = testfunction(x1, x2, x3, x4);

    // Print the flux and all the derivatives
    cout << result.value() << endl;
    cout << result.derivatives() << endl;
}

Then, all you need is an if-then-else to capture all the cases:

// Let's roll
int main() {

    // The user will supply this vector of parameter names
    // for which we will compute derivatives
    vector<string> gradients;
    gradients.push_back("x1");
    gradients.push_back("x2");

    if (gradients.size() == 1) compute<Grad1>(gradients);
    else if (gradients.size() == 2) compute<Grad2>(gradients);
    else if (gradients.size() == 3) compute<Grad3>(gradients);
    else if (gradients.size() == 4) compute<Grad4>(gradients);

    return 0;
}

This compiles and works like a charm! I'm attaching the code for future reference.
test_autodiff.txt

from starry.

rodluger avatar rodluger commented on August 26, 2024

@dfm This is how I'm currently structuring the code:

>>> import starry

>>> map1 = starry.Map()

>>> map1[1, 0] = 1

>>> map1.flux(axis=(0, 1, 0), theta=0.3, xo=0.1, yo=0.1, ro=0.1)

0.9626882655504516

>>> map2 = starry.grad.Map()

>>> map2[1, 0] = 1

>>> map2.flux(axis=(0, 1, 0), theta=0.3, xo=0.1, yo=0.1, ro=0.1)

array([[ 9.62688266e-01,  4.53620580e-04,  0.00000000e+00,
        -6.85580453e-05, -2.99401131e-01, -3.04715096e-03,
         1.48905485e-03, -2.97910667e-01]])

The modules starry and starry.grad are compiled from the same chunk of code, but with a healthy sprinkling of #ifdef STARRY_AUTODIFF to handle AutoDiffScalar-specific implementation stuff. They therefore have all the same classes, methods, properties, and docstrings, but their outputs are of course different. To get this to work, I'm #includeing that chunk of code twice, once with STARRY_AUTODIFF undefined, and once with it defined.

What do you think of this? It certainly hasn't helped my code legibility...

PS: I haven't finished implementing this, but starry.grad.Map().flux() is working on the master branch.

from starry.

rodluger avatar rodluger commented on August 26, 2024

Quick update on this. I'm slowly getting things to work with dynamically-sized derivative vectors, which is the ideal way to do this. The most important thing I've learned is that casting to type T is not enough, since type T has an Eigen::Dynamic vector length. I need to actually force the derivative vector of all intermediate variables -- and all function outputs -- to have the correct length. For instance, the following line in flux() doesn't work:

if (b <= ro - 1) return 0;

nor does

if (b <= ro - 1) return T(0);

since neither allocates space for the derivatives of the result. What I have to do is this:

if (b <= ro - 1) return 0 * ro;

where ro is one of the variables I'm differentiating.

For some reason I no longer get any compiler errors -- just segfaults when I finally run the code. Debugging this is therefore super tedious. But I'm getting the hang of it.

from starry.

rodluger avatar rodluger commented on August 26, 2024

Got the flux calculation to work all the way through! Now the user can dynamically choose which and how many derivatives to compute. Gonna take a while for me to clean the code up and push to the master branch, but I think it's downhill from here!

from starry.

rodluger avatar rodluger commented on August 26, 2024

Quick update on this: I'm switching back to compile-time defined derivative vector sizes. AutoDiffScalar<VectorXd> is riddled with issues and it's 20-30 times slower than the same calculation with fixed-size derivative vectors. It's actually more efficient to compute all derivatives and let the user choose which ones to output than to selectively compute only a few derivatives.

from starry.

rodluger avatar rodluger commented on August 26, 2024

Closing this issue. There are things that can still be optimized, but I'm happy!

from starry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.