Giter Site home page Giter Site logo

ml-reed's Introduction

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

This software project accompanies the research paper, Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards.

This repo forks and builds off of the BPref repo.

To run the SURF, RUNE, and MetaReward Net baselines we compare against in Paper title, please use the following repositories.

If you find our paper or code insightful, feel free to cite us with the following bibtex:

@inproceedings{metcalf23reed, title = {Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards}, author = {Metcalf, Katherine and Sarabia, Miguel and Mackraz, Natalie and Theobald Barry-John}, booktitle={Conference on Robot Learning}, year = {2023}, organization={PMLR}, url = {https://openreview.net/pdf?id=i84V7i6KEMd} }

Documentation

Getting Started

To install REED you first need to clone our repository and cd into it:

git clone https://github.com/apple/ml-reed.git
cd ml-REED

Then create and run the docker image in docker/Dockerfile:

# Create the docker
cd docker
docker build -t reed --platform linux/amd64 .
# Run the docker
docker run -it --rm reed

The docker has a venv at /opt/venv where most project requirements are already installed. The reed project is installed into the docker's venv

In the docker image install the project and start the venv:

bash setup.sh
source /opt/venv/bin/activate

Running PEBBLE baselines and REED

All experiments are run through the reed/experiments/run_preference_experiment.py script, which takes the following command line arguments:

  • --algorithm: The algorithm to execute. Must be one of pebble, pebble_image_augmentations, contrastive_reed, or distillation_reed.
  • --task: The environment and task on which to evaluation the given algorithm. Options are walker_walk, quadruped_walk, and cheetah_run from the DMC Suite and button_press, sweep_into, drawer_open, drawer_close, window_open, and door_open from MetaWorld.
  • --reward_from_images: Whether to learn the reward using image observations.
  • --preference_labeller: The BPref synthetic teacher to provide preference labels. Must be one of: equal, mistake, myopic, noisy, oracle, or skip.
  • --trajectory_pair_selection: The method by which trajectory pairs are selected for preference labelling. 0 is uniform sampling and 1 is disagreement sampling.
  • --max_feedback: The maximum about of trajectory pairs to be sent for labelling.
  • --out_dir: The location where results and models should be written.

For example, to run PEBBLE on walker-walk with the oracle labeller, disagreement sampling, 500 pieces of feedback, and joint observations use:

python reed/experiments/run_preference_experiment.py \
--algorithm pebble \
--task walker_walk \
--preference_labeller oracle \
--trajectory_pair_selection 1 \
--max_feedback 500 \
--out_dir <results/model directory>

and to run with image observation add the --reward_from_images flag:

python reed/experiments/run_preference_experiment.py \
--algorithm pebble \
--task walker_walk \
--preference_labeller oracle \
--trajectory_pair_selection 1 \
--max_feedback 500 \
--reward_from_images \
--out_dir <results/model directory>

ml-reed's People

Contributors

metcalfrin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.