Giter Site home page Giter Site logo

anassinator / ilqr Goto Github PK

View Code? Open in Web Editor NEW
342.0 11.0 74.0 1.36 MB

Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models

License: GNU General Public License v3.0

Python 100.00%
trajectory-optimization differential-dynamic-programming theano dynamics-models auto-differentiation model-predictive-control model-predictive-controller control-systems cartpole pendulum

ilqr's Introduction

Iterative Linear Quadratic Regulator

image

This is an implementation of the Iterative Linear Quadratic Regulator (iLQR) for non-linear trajectory optimization based on Yuval Tassa's paper.

It is compatible with both Python 2 and 3 and has built-in support for auto-differentiating both the dynamics model and the cost function using Theano.

Install

To install, clone and run:

python setup.py install

You may also install the dependencies with pipenv as follows:

pipenv install

Usage

After installing, import as follows:

from ilqr import iLQR

You can see the examples directory for Jupyter notebooks to see how common control problems can be solved through iLQR.

Dynamics model

You can set up your own dynamics model by either extending the Dynamics class and hard-coding it and its partial derivatives. Alternatively, you can write it up as a Theano expression and use the AutoDiffDynamics class for it to be auto-differentiated. Finally, if all you have is a function, you can use the FiniteDiffDynamics class to approximate the derivatives with finite difference approximation.

This section demonstrates how to implement the following dynamics model:


m = F − αv

where m is the object's mass in kg, alpha is the friction coefficient, v is the object's velocity in m/s, is the object's acceleration in m/s2, and F is the control (or force) you're applying to the object in N.

Automatic differentiation

import theano.tensor as T
from ilqr.dynamics import AutoDiffDynamics

x = T.dscalar("x")  # Position.
x_dot = T.dscalar("x_dot")  # Velocity.
F = T.dscalar("F")  # Force.

dt = 0.01  # Discrete time-step in seconds.
m = 1.0  # Mass in kg.
alpha = 0.1  # Friction coefficient.

# Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m

# Discrete dynamics model definition.
f = T.stack([
    x + x_dot * dt,
    x_dot + x_dot_dot * dt,
])

x_inputs = [x, x_dot]  # State vector.
u_inputs = [F]  # Control vector.

# Compile the dynamics.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
dynamics = AutoDiffDynamics(f, x_inputs, u_inputs)

Note: If you want to be able to use the Hessians (f_xx, f_ux, and f_uu), you need to pass the hessians=True argument to the constructor. This will increase compilation time. Note that iLQR does not require second-order derivatives to function.

Batch automatic differentiation

import theano.tensor as T
from ilqr.dynamics import BatchAutoDiffDynamics

state_size = 2  # [position, velocity]
action_size = 1  # [force]

dt = 0.01  # Discrete time-step in seconds.
m = 1.0  # Mass in kg.
alpha = 0.1  # Friction coefficient.

def f(x, u, i):
    """Batched implementation of the dynamics model.

    Args:
        x: State vector [*, state_size].
        u: Control vector [*, action_size].
        i: Current time step [*, 1].

    Returns:
        Next state vector [*, state_size].
    """
    x_ = x[..., 0]
    x_dot = x[..., 1]
    F = u[..., 0]

    # Acceleration.
    x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m

    # Discrete dynamics model definition.
    return T.stack([
        x_ + x_dot * dt,
        x_dot + x_dot_dot * dt,
    ]).T

# Compile the dynamics.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
dynamics = BatchAutoDiffDynamics(f, state_size, action_size)

Note: This is a faster version of AutoDiffDynamics that doesn't support Hessians.

Finite difference approximation

from ilqr.dynamics import FiniteDiffDynamics

state_size = 2  # [position, velocity]
action_size = 1  # [force]

dt = 0.01  # Discrete time-step in seconds.
m = 1.0  # Mass in kg.
alpha = 0.1  # Friction coefficient.

def f(x, u, i):
    """Dynamics model function.

    Args:
        x: State vector [state_size].
        u: Control vector [action_size].
        i: Current time step.

    Returns:
        Next state vector [state_size].
    """
    [x, x_dot] = x
    [F] = u

    # Acceleration.
    x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m

    return np.array([
        x + x_dot * dt,
        x_dot + x_dot_dot * dt,
    ])

# NOTE: Unlike with AutoDiffDynamics, this is instantaneous, but will not be
# as accurate.
dynamics = FiniteDiffDynamics(f, state_size, action_size)

Note: It is possible you might need to play with the epsilon values (x_eps and u_eps) used when computing the approximation if you run into numerical instability issues.

Usage

Regardless of the method used for constructing your dynamics model, you can use them as follows:

curr_x = np.array([1.0, 2.0])
curr_u = np.array([0.0])
i = 0  # This dynamics model is not time-varying, so this doesn't matter.

>>> dynamics.f(curr_x, curr_u, i)
... array([ 1.02   ,  2.01998])
>>> dynamics.f_x(curr_x, curr_u, i)
... array([[ 1.     ,  0.01   ],
           [ 0.     ,  1.00999]])
>>> dynamics.f_u(curr_x, curr_u, i)
... array([[ 0.    ],
           [ 0.0001]])

Comparing the output of the AutoDiffDynamics and the FiniteDiffDynamics models should generally yield consistent results, but the auto-differentiated method will always be more accurate. Generally, the finite difference approximation will be faster unless you're also computing the Hessians: in which case, Theano's compiled derivatives are more optimized.

Cost function

Similarly, you can set up your own cost function by either extending the Cost class and hard-coding it and its partial derivatives. Alternatively, you can write it up as a Theano expression and use the AutoDiffCost class for it to be auto-differentiated. Finally, if all you have are a loss functions, you can use the FiniteDiffCost class to approximate the derivatives with finite difference approximation.

The most common cost function is the quadratic format used by Linear Quadratic Regulators:


(x − xgoal)TQ(x − xgoal) + (u − ugoal)TR(u − ugoal)

where Q and R are matrices defining your quadratic state error and quadratic control errors and xgoal is your target state. For convenience, an implementation of this cost function is made available as the QRCost class.

QRCost class

import numpy as np
from ilqr.cost import QRCost

state_size = 2  # [position, velocity]
action_size = 1  # [force]

# The coefficients weigh how much your state error is worth to you vs
# the size of your controls. You can favor a solution that uses smaller
# controls by increasing R's coefficient.
Q = 100 * np.eye(state_size)
R = 0.01 * np.eye(action_size)

# This is optional if you want your cost to be computed differently at a
# terminal state.
Q_terminal = np.array([[100.0, 0.0], [0.0, 0.1]])

# State goal is set to a position of 1 m with no velocity.
x_goal = np.array([1.0, 0.0])

# NOTE: This is instantaneous and completely accurate.
cost = QRCost(Q, R, Q_terminal=Q_terminal, x_goal=x_goal)

Automatic differentiation

import theano.tensor as T
from ilqr.cost import AutoDiffCost

x_inputs = [T.dscalar("x"), T.dscalar("x_dot")]
u_inputs = [T.dscalar("F")]

x = T.stack(x_inputs)
u = T.stack(u_inputs)

x_diff = x - x_goal
l = x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)
l_terminal = x_diff.T.dot(Q_terminal).dot(x_diff)

# Compile the cost.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
cost = AutoDiffCost(l, l_terminal, x_inputs, u_inputs)

Batch automatic differentiation

import theano.tensor as T
from ilqr.cost import BatchAutoDiffCost

def cost_function(x, u, i, terminal):
    """Batched implementation of the quadratic cost function.

    Args:
        x: State vector [*, state_size].
        u: Control vector [*, action_size].
        i: Current time step [*, 1].
        terminal: Whether to compute the terminal cost.

    Returns:
        Instantaneous cost [*].
    """
    Q_ = Q_terminal if terminal else Q
    l = x.dot(Q_).dot(x.T)
    if l.ndim == 2:
        l = T.diag(l)

    if not terminal:
        l_u = u.dot(R).dot(u.T)
        if l_u.ndim == 2:
            l_u = T.diag(l_u)
        l += l_u

    return l

# Compile the cost.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
cost = BatchAutoDiffCost(cost_function, state_size, action_size)

Finite difference approximation

from ilqr.cost import FiniteDiffCost


def l(x, u, i):
    """Instantaneous cost function.

    Args:
        x: State vector [state_size].
        u: Control vector [action_size].
        i: Current time step.

    Returns:
        Instantaneous cost [scalar].
    """
    x_diff = x - x_goal
    return x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)


def l_terminal(x, i):
    """Terminal cost function.

    Args:
        x: State vector [state_size].
        i: Current time step.

    Returns:
        Terminal cost [scalar].
    """
    x_diff = x - x_goal
    return x_diff.T.dot(Q_terminal).dot(x_diff)


# NOTE: Unlike with AutoDiffCost, this is instantaneous, but will not be as
# accurate.
cost = FiniteDiffCost(l, l_terminal, state_size, action_size)

Note: It is possible you might need to play with the epsilon values (x_eps and u_eps) used when computing the approximation if you run into numerical instability issues.

Usage

Regardless of the method used for constructing your cost function, you can use them as follows:

>>> cost.l(curr_x, curr_u, i)
... 400.0
>>> cost.l_x(curr_x, curr_u, i)
... array([   0.,  400.])
>>> cost.l_u(curr_x, curr_u, i)
... array([ 0.])
>>> cost.l_xx(curr_x, curr_u, i)
... array([[ 200.,    0.],
           [   0.,  200.]])
>>> cost.l_ux(curr_x, curr_u, i)
... array([[ 0.,  0.]])
>>> cost.l_uu(curr_x, curr_u, i)
... array([[ 0.02]])

Putting it all together

N = 1000  # Number of time-steps in trajectory.
x0 = np.array([0.0, -0.1])  # Initial state.
us_init = np.random.uniform(-1, 1, (N, 1)) # Random initial action path.

ilqr = iLQR(dynamics, cost, N)
xs, us = ilqr.fit(x0, us_init)

xs and us now hold the optimal state and control trajectory that reaches the desired goal state with minimum cost.

Finally, a RecedingHorizonController is also bundled with this package to use the iLQR controller in Model Predictive Control.

Important notes

To quote from Tassa's paper: "Two important parameters which have a direct impact on performance are the simulation time-step dt and the horizon length N. Since speed is of the essence, the goal is to choose those values which minimize the number of steps in the trajectory, i.e. the largest possible time-step and the shortest possible horizon. The size of dt is limited by our use of Euler integration; beyond some value the simulation becomes unstable. The minimum length of the horizon N is a problem-dependent quantity which must be found by trial-and-error."

Contributing

Contributions are welcome. Simply open an issue or pull request on the matter.

Linting

We use YAPF for all Python formatting needs. You can auto-format your changes with the following command:

yapf --recursive --in-place --parallel .

You may install the linter as follows:

pipenv install --dev

License

See LICENSE.

Credits

This implementation was partially based on Yuval Tassa's MATLAB implementation, and navigator8972's implementation.

ilqr's People

Contributors

adamgleave avatar anassinator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ilqr's Issues

Coding Problem

I met a problem that shows the error as :too many indices for array!In my code,the dimension of state is 12,the dimension of action is 3.Hope someone can help me.
_016

Dynamics definition in a vectorized way

Hi,

I have tried to implement cartpole's dynamics in a vectorized way (in cartpole.py) but it failed probably due to my lack of knowledge of Theano.

Basically, instead of defining them this way:

    f = T.stack([
        x + x_dot * dt,
        x_dot + x_dot_dot * dt,
        T.sin(next_theta),
        T.cos(next_theta),
        theta_dot + theta_dot_dot * dt,
    ])

define them somehow like this:

A = np.array([[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 1]])
f = T.dot(A, [x, x_dot, sin_theta, cos_theta, theta_dot]) + T.stack([0, 0, T.sin(next_theta), T.cos(next_theta), 0]) + dt*T.stack([x_dot, x_dot_dot, 0, 0, theta_dot_dot])

I get the following error when running cartpole's notebook:

image
image

Thank you for your help !

TypeError: Unknown parameter type: <class 'theano.tensor.var.TensorVariable'> in the CartPole example

Hi!

I tried to run your CartPole Jupyter notebook example. However, I run into this problem:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-aa80e5b7db85> in <module>()
      1 dt = 0.05
      2 pole_length = 1.0
----> 3 dynamics = CartpoleDynamics(dt, l=pole_length)

6 frames
/usr/local/lib/python3.7/dist-packages/theano/compile/function/pfunc.py in _pfunc_param_to_in(param, strict, allow_downcast)
    541     elif isinstance(param, In):
    542         return param
--> 543     raise TypeError(f"Unknown parameter type: {type(param)}")
    544 
    545 

TypeError: Unknown parameter type: <class 'theano.tensor.var.TensorVariable'>

I run it in colab. I installed your package this way: !pip install -U git+https://github.com/anassinator/ilqr.git. I tried to force install Theano == 1.0.4, but it didn't help.

Can you please help me figure this out?

Thanks,
Piotr

alpha multiplication with k and K

Hi anassinator

First off, thanks a lot for a beautifully written library!

I have noticed that in the following line you multiply alpha with (k[i] + K[i]...), whereas in the paper eq 12 (as well as in a comment of yours in the lines above) it is mentioned that alpha should only be multiplied with k[i].

us_new[i] = us[i] + alpha * (k[i] + K[i].dot(xs_new[i] - xs[i]))

Is this done so on purpose?

Non-linear dynamics -- Trajectory Following

Hello and thanks for the interesting package you've created!

I wanted to ask how can I use it for a dynamic model that depends explicitly on time

i.e x_dot = (t-u)*x^2

where u is the input, t is time and x is the state.

Moreover, can your package be used for Trajectory following?

I.e mininimize -> J = sum_i=0^N ( x(i) - x_ref(i) )^T Q ( x(i) - x_ref(i) ) + ( u(i) - u_ref(i) )^T Q ( u(i) - u_ref(i) )

Thanks in advance!

trajectory tracking problem

Hi anassinator, great work and thanks for the contribution. I am wondering if this can handle problems with with one step cost being L(X; u) = (X-X*)TQ(X-X*)+ (u-u*)TR(u-u*), where X* and u* are my target trajectory? @anassinator

time varying MPC

I am trying to create a MPC to follow a path. So the cost function is time varying. I was reading through the code and you commented the cost:

"If your cost or dynamics are time dependent, then you might need to shift their internal state accordingly."

I think I understand what you mean by this but I am unsure of how I would implement such a thing. Do you have any insight on how to do this?

Theoretical question

I have a reference trajectory that I want to follow but I want as well to reduce some input in my truck (lets say the steering of the vehicle in order to smooth a little bit the original trajectory). The dynamics of my truck are non-linear so every time step I linearize them based on my current trajectory and current controls. I have read that when you linearize the resulting state and control vector are no longer x and u, but x* = x-x_linearized and u*=u-u_linearized. So my question is if what I say is true, how can I adapt my cost function to follow trajectory but reduce steering? Does it make sense something like:
J= np.dot(state_vector, state_vector)+np.dot(control_vector+current_control, control_vector+current_control)
Where current_control is the control obtained in the last iteration so as we are linearizing based on them if we do u*+u_linearized I obtain u which is what I want to linearize. Does it make sense?

Add constraints to control and/or state

Hi,

Thanks a lot for the great repo! Clean implementation of iLQR and easy to understand, it's really useful.

I have a question though: would it be possible to constrain the possible values of the control input u that is returned by the optimization procedure? Like we want to pick the optimal control trajectory, but inside of some bounds on the values that are outputted by the algorithm. I'm happy to implement it myself if you know where/how it should be done, but I can't figure it out yet.

Let me know if you have any idea how to do that!

Result after constraints

As the constraints in the input are introduced after fit() once I get this new constraints and put them in the dynamical model of the truck it won't reach the desired goal position at all, that is why I do not understand how you can solve the problem with constraints. My problem statement looks like:

def cost(self):

     state_size = 4  # [x,y,theta,alpha]
     action_size = 1  # [steer]

     # The coefficients weigh how much your state error is worth to you vs
     # the size of your controls. You can favor a solution that uses smaller
     # controls by increasing R's coefficient.
     Q = 0.5 * np.eye(state_size)
     R = 0.5 * np.eye(action_size)

     # This is optional if you want your cost to be computed differently at a
     # terminal state.
     Q_terminal = np.array([[100.0, 0.0, 0.0, 0.0], [0.0, 100.0, 0.0, 0.0],
                            [0.0, 0.0, 100.0, 0.0], [0.0, 0.0, 0.0, 100.0]])

     # State goal is set to a position of 1 m with no velocity.
     x_goal = np.array([68.70061495018659,
                        24.387377335723006,
                        0.7667067876774367,
                        -0.4414311718278174])

     # NOTE: This is instantaneous and completely accurate.
     cost = QRCost(Q, R, Q_terminal=Q_terminal, x_goal=x_goal)
     return cost

 def ilqr(self):
     x0 = np.array([40.00000000000024,
                    14.05,
                    1.5622809766054426,
                    0.008515350184557442])  # Initial state.
     us_init = np.array([...])
     N = len(us_init)
     cost = self.cost()
     dynamics = self.dynamic_model()
     ilqr = iLQR(dynamics, cost, N)
     xs, us = ilqr.fit(x0, us_init, n_iterations = 100)
     us = constrain(us, -0.55, 0.55)
     for i in range(1,len(xs)):
         xs[i][0] = xs[i-1][0] + np.cos(xs[i-1][2]) * self.dt
         xs[i][1] = xs[i-1][1] + np.sin(xs[i-1][2]) * self.dt
         xs[i][2] = xs[i-1][2] + (1/3.6)*np.tan(us[i-1]) * self.dt
         xs[i][3] = xs[i-1][3] - ((1/12.086)*np.sin(xs[i-1][3])+(1/3.6)*np.tan(us[i-1])) * self.dt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.