Giter Site home page Giter Site logo

vik0909 / reward-learning-rl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from avisingh599/reward-learning-rl

0.0 0.0 0.0 13.25 MB

[RSS 2019] End-to-End Robotic Reinforcement Learning without Reward Engineering

Home Page: https://sites.google.com/view/reward-learning-rl/

License: Other

Shell 0.62% Python 99.38%

reward-learning-rl's Introduction

reward-learning-rl

This repository is the official implementation of the following paper:

End-to-End Robotic Reinforcement Learning without Reward Engineering
Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, Sergey Levine
Robotics: Science and Systems 2019
Website | Video | Arxiv

Visual Draping Visual Pushing Visual Bookshelf

Visual Door Opening Visual Pusher Visual Picker

We propose a method for end-to-end learning of robotic skills in the real world using deep reinforcement learning. We learn these policies directly on pixel observations, and we do so without any hand-engineered or task-specific reward functions, and instead learn the rewards for such tasks from a small number of user-provided goal examples (around 80), followed by a modest number of active queries (around 25-75).

This implementation is based on softlearning.

Getting Started

Prerequisites

The environment can be run either locally using conda or inside a docker container. For conda installation, you need to have Conda installed. For docker installation you will need to have Docker and Docker Compose installed. Also, most of our environments currently require a MuJoCo license.

Conda Installation

  1. Download and install MuJoCo 1.50 from the MuJoCo website. We assume that the MuJoCo files are extracted to the default location (~/.mujoco/mjpro150).

  2. Copy your MuJoCo license key (mjkey.txt) to ~/.mujoco/mjkey.txt:

  3. Clone reward-learning-rl

git clone https://github.com/avisingh599/reward-learning-rl.git ${REWARD_LEARNING_PATH}
  1. Create and activate conda environment, install softlearning to enable command line interface.
cd ${REWARD_LEARNING_PATH}
conda env create -f environment.yml
conda activate softlearning
pip install -e ${REWARD_LEARNING_PATH}

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

conda deactivate
conda remove --name softlearning --all

Docker Installation

docker-compose

To build the image and run the container:

export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \
    && docker-compose \
        -f ./docker/docker-compose.dev.gpu.yml \
        up \
        -d \
        --force-recreate

You can access the container with the typical Docker exec-command, i.e.

docker exec -it softlearning bash

See examples section for examples of how to train and simulate the agents.

Finally, to clean up the docker setup:

docker-compose \
    -f ./docker/docker-compose.dev.gpu.yml \
    down \
    --rmi all \
    --volumes

Examples

Training an agent

softlearning run_example_local examples.classifier_rl \
--n_goal_examples 10 \
--task=Image48SawyerDoorPullHookEnv-v0 \
--algorithm VICERAQ \
--num-samples 5 \
--n_epochs 300 \
--active_query_frequency 10

The tasks used in the paper were Image48SawyerPushForwardEnv-v0, Image48SawyerDoorPullHookEnv-v0 and Image48SawyerPickAndPlace3DEnv-v0. For the algorithm, you can experiment with VICERAQ, VICE, RAQ, SACClassifier, and SAC. The --num-samples flag specifies the number of random seeds launched. All results in the paper were averaged across five random seeds. The hyperparameters are stored in examples/classifier_rl/variants.py.

examples.classifier_rl.main contains several different environments. For more information about the agents and configurations, run the scripts with --help flag: python ./examples/classifier_rl/main.py --help.

Version history

v0.1

  • This version contains the code to reproduce the results in Singh et al, RSS 2019.

Citation

If this codebase helps you in your academic research, you are encouraged to cite our paper. Here is an example bibtex:

@article{singh2019,
  title={End-to-End Robotic Reinforcement Learning without Reward Engineering},
  author={Avi Singh and Larry Yang and Kristian Hartikainen and Chelsea Finn and Sergey Levine},
  journal={Robotics: Science and Systems},
  year={2019}
}

If you mainly use the VICE algorithm implemented here, you should also cite:

@article{fu2018,
  title={Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition},
  author={Justin Fu and Avi Singh and Dibya Ghosh and Larry Yang and Sergey Levine},
  journal={Neural Information Processing Systems},
  year={2018}
}

reward-learning-rl's People

Contributors

hartikainen avatar haarnoja avatar avisingh599 avatar azhou42 avatar henry-zhang-bohan avatar alacarter avatar ben-eysenbach avatar brandontrabucco avatar vitchyr avatar hrtang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.