Giter Site home page Giter Site logo

mbpo's Introduction

Model-Based Policy Optimization

Code to reproduce the experiments in When to Trust Your Model: Model-Based Policy Optimization.

Installation

  1. Install MuJoCo 1.50 at ~/.mujoco/mjpro150 and copy your license key to ~/.mujoco/mjkey.txt
  2. Clone mbpo
git clone --recursive https://github.com/jannerm/mbpo.git
  1. Create a conda environment and install mbpo
cd mbpo
conda env create -f environment/gpu-env.yml
conda activate mbpo
pip install -e viskit
pip install -e .

Usage

Configuration files can be found in examples/config/.

mbpo run_local examples.development --config=examples.config.halfcheetah.0 --gpus=1 --trial-gpus=1

Currently only running locally is supported.

New environments

To run on a different environment, you can modify the provided template. You will also need to provide the termination function for the environment in mbpo/static. If you name the file the lowercase version of the environment name, it will be found automatically. See hopper.py for an example.

Logging

This codebase contains viskit as a submodule. You can view saved runs with:

viskit ~/ray_mbpo --port 6008

assuming you used the default log_dir.

Hyperparameters

The rollout length schedule is defined by a length-4 list in a config file. The format is [start_epoch, end_epoch, start_length, end_length], so the following:

'rollout_schedule': [20, 100, 1, 5] 

corresponds to a model rollout length linearly increasing from 1 to 5 over epochs 20 to 100.

If you want to speed up training in terms of wall clock time (but possibly make the runs less sample-efficient), you can set a timeout for model training (max_model_t, in seconds) or train the model less frequently (every model_train_freq steps).

Note: This repo contains ongoing research. Minor differences between this code and the paper will be updated in v2.

Comparing to MBPO

If you would like to compare to MBPO but do not have the resources to re-run all experiments, the learning curves found in Figure 2 of the paper (plus on the Humanoid environment) are available in this shared folder. See plot.py for an example of how to read the pickle files with the results.

Reference

If you find this code useful in an academic setting, please cite:

@article{janner2019mbpo,
  author = {Michael Janner and Justin Fu and Marvin Zhang and Sergey Levine},
  title = {When to Trust Your Model: Model-Based Policy Optimization},
  journal = {arXiv preprint arXiv:1906.08253},
  year = {2019}
}

Acknowledgments

The underlying soft actor-critic implementation in MBPO comes from Tuomas Haarnoja and Kristian Hartikainen's softlearning codebase. The modeling code is a slightly modified version of Kurtland Chua's PETS implementation.

mbpo's People

Contributors

jannerm avatar dependabot[bot] avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.