Giter Site home page Giter Site logo

mahanfathi / model-based-rl Goto Github PK

View Code? Open in Web Editor NEW
29.0 2.0 4.0 1.94 MB

Model-based Policy Gradients

Python 100.00%
model-based reinforcement-learning backpropagation mujoco mujoco-py finite-difference computational-graphs pytorch gym openai-gym ilqg direct-policy-search policy-optimization policy-gradient ilqg-mujoco ilqr mujoco-dynamics policy-gradients computation-graph

model-based-rl's Introduction

Model-based Reinforcement Learning

Directly back-propagate into your policy network, from model jacobians calculated in MuJoCo using finite-difference.

To backprop into stochastic policies, given an unknown model, one has to use the REINFORCE theory, to be able to calculate the gradients by sampling the environment. These methods usually have high variance, so baselines and value/advantage functions were introduced. Another way to backpropagate into your policy network is to use the “reparameterization trick” as in VAEs, but they entail knowledge of upstream gradients, and hence a known model. The policy gradients calculated w/ the reparam trick are often much lower in variance, so one can go wo/ baselines and value networks. This project puts it all together: computation graph of policy and dynamics, upstream gradients from MuJoCo dynamics and rewards, reparam trick, and optimization.

Vanilla Computation Graph

     +----------+S0+----------+              +----------+S1+----------+
     |                        |              |                        |
     |    +------+   A0   +---v----+         +    +------+   A1   +---v----+
S0+------>+Policy+---+--->+Dynamics+---+---+S1+-->+Policy+---+--->+Dynamics+--->S2  ...
     |    +------+   |    +--------+   |     +    +------+   |    +--------+    |
     |               |                 |     |               |                  |
     |            +--v---+             |     |            +--v---+              |
     +---+S0+---->+Reward+<-----S1-----+     +---+S1+---->+Reward+<-----S2------+
                  +------+                                +------+

Results

This repo contains:

  • Finite-difference calculation of MuJoCo dynamics jacobians in mujoco-py
  • MuJoCo dynamics as a PyTorch Operation (i.e. forward and backward pass)
  • Reward function PyTorch Operation
  • Flexible design to wire up your own meta computation graph
  • Trajectory Optimization module alongside Policy Networks
  • Flexible design to define your own environment in gym
  • Fancy logger and monitoring

Dependencies

Python3.6:

  • torch
  • mujoco-py
  • gym
  • numpy
  • visdom

Other:

  • Tested w/ mujoco200

Usage

For latest changes:

git clone -b development [email protected]:MahanFathi/Model-Based-RL.git

Run:

python3 main.py --config-file ./configs/inverted_pendulum.yaml

model-based-rl's People

Contributors

aikkala avatar mahanfathi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

model-based-rl's Issues

ModuleNotFoundError: No module named 'optimizer'

Hi,
Hope you are doing well.
I tried run this code, but I am getting following error. Please let me know what I need to do to fix it.

(VANILLA) shubham@shubham-VirtualBox:~/Model-Based-RL$ python3 main.py --config-file ./configs/inverted_pendulum.yaml
Traceback (most recent call last):
File "main.py", line 5, in
import model.engine.trainer
File "/home/shubham/Model-Based-RL/model/init.py", line 1, in
from .build import build_model
File "/home/shubham/Model-Based-RL/model/build.py", line 2, in
from model import archs
File "/home/shubham/Model-Based-RL/model/archs/init.py", line 1, in
from .basic import Basic
File "/home/shubham/Model-Based-RL/model/archs/basic.py", line 3, in
from model.blocks import build_policy, mj_torch_block_factory
File "/home/shubham/Model-Based-RL/model/blocks/init.py", line 1, in
from .policy import build_policy
File "/home/shubham/Model-Based-RL/model/blocks/policy/init.py", line 3, in
from .trajopt import TrajOpt
File "/home/shubham/Model-Based-RL/model/blocks/policy/trajopt.py", line 4, in
from .strategies import *
File "/home/shubham/Model-Based-RL/model/blocks/policy/strategies.py", line 9, in
from optimizer import Optimizer
ModuleNotFoundError: No module named 'optimizer'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.