Giter Site home page Giter Site logo

state-marginal-matching's Introduction

Efficient Exploration via State Marginal Matching

This is the reference implementation for the following paper:

Efficient Exploration via State Marginal Matching.
Lisa Lee*, Benjamin Eysenbach*, Emilio Parisotto*, Eric Xing, Ruslan Salakhutdinov, Sergey Levine. arXiv preprint, 2019.

Getting Started

Installation

This repository is based on rlkit.

  1. You can clone this repository by running:
git clone https://github.com/RLAgent/state-marginal-matching.git
cd state-marginal-matching

All subsequent commands in this README should be run from the top-level directory of this repository (i.e., /path/to/state-marginal-matching/).

  1. Install Mujoco 1.5 and mujoco-py. Note that it requires a Mujoco license.

  2. Create and activate conda enviroment:

conda env create -f conda_env.yml
source activate smm_env

Note: If running on Mac OS X, comment out patchelf, box2d, and box2d-kengz in conda_env.yml.

To deactivate the conda environment, run conda deactivate. To remove it, run conda env remove -n smm_env.

Running the code

1. Training a policy on ManipulationEnv

python -m train configs/smm_manipulation.json          # State Marginal Matching (SMM) with 4 latent skills
python -m train configs/sac_manipulation.json          # Soft Actor-Critic (SAC)
python -m train configs/icm_manipulation.json          # Intrinsic Curiosity Module (ICM)
python -m train configs/count_manipulation.json        # Count-based Exploration
python -m train configs/pseudocount_manipulation.json  # Pseudocount

The log directory can be set with --log-dir /path/to/log/dir. By default, the log directory is set to out/.

2. Visualizing a trained policy

python -m visualize /path/to/log/dir                               # Without historical averaging
python -m visualize /path/to/log/dir --num-historical-policies 10  # With historical averaging

3. Evaluating a trained policy

python -m test /path/to/log/dir                                # Without historical averaging
python -m test /path/to/log/dir --config configs/test_ha.json  # With historical averaging

To view more flag options, run the scripts with the --help flag. For example:

$ python -m train --help
Usage: train.py [OPTIONS] CONFIG

Options:
  --cpu
  --log-dir TEXT
  --snapshot-gap INTEGER  How often to save model checkpoints (by # epochs).
  --help                  Show this message and exit.

References

The algorithms are based on the following papers:

Efficient Exploration via State Marginal Matching.
Lisa Lee*, Benjamin Eysenbach*, Emilio Parisotto*, Eric Xing, Ruslan Salakhutdinov, Sergey Levine. arXiv preprint, 2019.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. ICML 2018.

Curiosity-driven Exploration by Self-supervised Prediction.
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell. ICML 2017.

Unifying Count-Based Exploration and Intrinsic Motivation.
Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos. NIPS 2016.

Citation

@article{smm2019,
  title={Efficient Exploration via State Marginal Matching},
  author={Lisa Lee and Benjamin Eysenbach and Emilio Parisotto and Eric Xing and Sergey Levine and Ruslan Salakhutdinov},
  year={2019}
}

state-marginal-matching's People

Contributors

rlagent avatar williamwu96 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.