model-based-rl's Introduction

Model-based Reinforcement Learning

Directly back-propagate into your policy network, from model jacobians calculated in MuJoCo using finite-difference.

To backprop into stochastic policies, given an unknown model, one has to use the REINFORCE theory, to be able to calculate the gradients by sampling the environment. These methods usually have high variance, so baselines and value/advantage functions were introduced. Another way to backpropagate into your policy network is to use the “reparameterization trick” as in VAEs, but they entail knowledge of upstream gradients, and hence a known model. The policy gradients calculated w/ the reparam trick are often much lower in variance, so one can go wo/ baselines and value networks. This project puts it all together: computation graph of policy and dynamics, upstream gradients from MuJoCo dynamics and rewards, reparam trick, and optimization.

Vanilla Computation Graph

     +----------+S0+----------+              +----------+S1+----------+
     |                        |              |                        |
     |    +------+   A0   +---v----+         +    +------+   A1   +---v----+
S0+------>+Policy+---+--->+Dynamics+---+---+S1+-->+Policy+---+--->+Dynamics+--->S2  ...
     |    +------+   |    +--------+   |     +    +------+   |    +--------+    |
     |               |                 |     |               |                  |
     |            +--v---+             |     |            +--v---+              |
     +---+S0+---->+Reward+<-----S1-----+     +---+S1+---->+Reward+<-----S2------+
                  +------+                                +------+

Results

This repo contains:

Finite-difference calculation of MuJoCo dynamics jacobians in mujoco-py
MuJoCo dynamics as a PyTorch Operation (i.e. forward and backward pass)
Reward function PyTorch Operation
Flexible design to wire up your own meta computation graph
Trajectory Optimization module alongside Policy Networks
Flexible design to define your own environment in gym
Fancy logger and monitoring

Dependencies

Python3.6:

torch
mujoco-py
gym
numpy
visdom

Other:

Tested w/ mujoco200

Usage

For latest changes:

git clone -b development [email protected]:MahanFathi/Model-Based-RL.git

Run:

python3 main.py --config-file ./configs/inverted_pendulum.yaml

model-based-rl's People

Contributors

Stargazers

Watchers

model-based-rl's Issues

ModuleNotFoundError: No module named 'optimizer'

Hi,
Hope you are doing well.
I tried run this code, but I am getting following error. Please let me know what I need to do to fix it.

(VANILLA) shubham@shubham-VirtualBox:~/Model-Based-RL$ python3 main.py --config-file ./configs/inverted_pendulum.yaml
Traceback (most recent call last):
File "main.py", line 5, in
import model.engine.trainer
File "/home/shubham/Model-Based-RL/model/init.py", line 1, in
from .build import build_model
File "/home/shubham/Model-Based-RL/model/build.py", line 2, in
from model import archs
File "/home/shubham/Model-Based-RL/model/archs/init.py", line 1, in
from .basic import Basic
File "/home/shubham/Model-Based-RL/model/archs/basic.py", line 3, in
from model.blocks import build_policy, mj_torch_block_factory
File "/home/shubham/Model-Based-RL/model/blocks/init.py", line 1, in
from .policy import build_policy
File "/home/shubham/Model-Based-RL/model/blocks/policy/init.py", line 3, in
from .trajopt import TrajOpt
File "/home/shubham/Model-Based-RL/model/blocks/policy/trajopt.py", line 4, in
from .strategies import *
File "/home/shubham/Model-Based-RL/model/blocks/policy/strategies.py", line 9, in
from optimizer import Optimizer
ModuleNotFoundError: No module named 'optimizer'

Recommend Projects

mahanfathi / model-based-rl Goto Github PK

model-based-rl's Introduction

Model-based Reinforcement Learning

Vanilla Computation Graph

Results

This repo contains:

Dependencies

Usage

model-based-rl's People

Contributors

Stargazers

Watchers

Forkers

model-based-rl's Issues

ModuleNotFoundError: No module named 'optimizer'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent