Giter Site home page Giter Site logo

makolon / stap Goto Github PK

View Code? Open in Web Editor NEW

This project forked from agiachris/stap

0.0 0.0 0.0 15.12 MB

Official repository for "STAP: Sequencing Task-Agnostic Policies," presented at ICRA 2023.

License: MIT License

Shell 4.10% Python 93.63% HTML 1.15% PDDL 0.80% Dockerfile 0.32%

stap's Introduction

STAP: Sequencing Task-Agnostic Policies

The official code repository for "STAP: Sequencing Task-Agnostic Policies," presented at ICRA 2023. For a brief overview of our work, please refer to our project page. Further details can be found in our paper available on arXiv.

STAP Preview

Overview

The STAP framework can be broken down into two phases: (1) train skills offline (i.e. policies, Q-functions, dynamics models, uncertainty quantifers); 2) plan with skills online (i.e. motion planning, task and motion planning). We provide implementations for both phases:

๐Ÿ› ๏ธ Train Skills Offline

  • Skill library: A suite of reinforcement learning (RL) and inverse RL algorithms to learn four skills: Pick, Place, Push, Pull.
  • Dynamics models: Trainers for learning skill-specific dynamics models from off-policy transition experience.
  • UQ models: Sketching Curvature for Out-of-Distribution Detection (SCOD) implementation and trainers for Q-network epistemic uncertainty quantification (UQ).

๐Ÿš€ Plan with Skills Online

  • Motion planners (STAP): A set of sampling-based motion planners including randomized sampling, cross-entropy method, planning with uncertainty-aware metrics, and combinations.
  • Task and motion planners (TAMP): Coupling PDDL-based task planning with STAP-based motion planning.

๐Ÿ—‚๏ธ Additionals

  • Baseline methods: Implementations of Deep Affordance Foresight (DAF) and parameterized-action Dreamer.
  • 3D Environments: PyBullet tabletop manipulation environment with domain randomization.

Setup

System Requirements

This repository is primarily tested on Ubuntu 20.04 and macOS Monterey with Python 3.8.10.

Installation

Python packages are managed through Pipenv. Follow the installation procedure below to get setup:

# Install pyenv.
curl https://pyenv.run | bash 
exec $SHELL          # Restart shell for path changes to take effect.
pyenv install 3.8.10 # Install a Python version.
pyenv global 3.8.10  # Set this Python to default.

# Clone repository.
git clone https://github.com/agiachris/STAP.git --recurse-submodules
cd STAP

# Install pipenv.
pip install pipenv
pipenv install --dev
pipenv sync

Use pipenv shell The load the virtual environment in the current shell.

Instructions

Basic Usage

STAP supports training skills, dynamics models, and composing these components at test-time for planning.

  • STAP module: The majority of the project code is located in the package stap/.
  • Scripts: Code for launching experiments, debugging, plotting, and visualization is under scripts/.
  • Configs: Training and evaluation functionality is determined by .yaml configuration files located in configs/.

Launch Scripts

We provide launch scripts for training STAP's required models below. The launch scripts also support parallelization on a cluster managed by SLURM, and will otherwise default to sequentially processing jobs.

Model Checkpoints

As an alternative to training skills and dynamics models from scratch, we provide checkpoints that can be downloaded and directly used to evaluate STAP planners. Run the following commands to download the model checkpoints to the default models/ directory (this requires ~10GBs of disk space):

pipenv shell  # script requires gdown
bash scripts/download/download_checkpoints.sh

Once the download has finished, the models/ directory will contain:

  • Skills trained with RL (agents_rl) and their dynamics models (dynamics_rl)
  • Skills trained with inverse RL (policies_irl) and their dynamics models (dynamics_irl)
  • Demonstration data used to train inverse RL skills (datasets)
  • Checkpoints for the Deep Affordance Foresight baseline (baselines)

Checkpoint Results

We also provide the planning results that correspond to evaluating STAP on these checkpoints. To download the results to the default plots/ directory, run the following command (this requires ~3.5GBs of disk space):

pipenv shell  # script requires gdown
bash scripts/download/download_results.sh

The planning results can be visualized by running bash scripts/visualize/generate_figures.sh which will save the figure shown below to plots/planning-result.jpg.

STAP Motion Planning Result

Training Skills

Skills in STAP are trained independently in custom environments. We provide two pipelines, RL and inverse RL, for training skills. While we use RL in the paper, skills learned via inverse RL yield significantly higher planning performance. This is because inverse RL offers more control over the training pipeline, allowing us to tune hyperparameters for data generation and skill training. We have only tested SCOD UQ with skills learned via RL.

Reinforcement Learning

To simultaneously learn an actor-critic per skill with RL, the relevant command is:

bash scripts/train/train_agents.sh

When the skills have finished training, copy and rename the desired checkpoints.

python scripts/debug/select_checkpoints.py --clone-name official --clone-dynamics True

These copied checkpoints will be used for planning.

(Optional) Uncertainty Quantification

Training SCOD is only required if the skills are intended to be used with an uncertainty-aware planner.

bash scripts/train/train_scod.sh

Inverse Reinforcement Learning

To instead use inverse RL to learn a critic, then an actor, we first generate a dataset of demos per skill:

bash scripts/data/generate_primitive_datasets.sh    # generate skill data
bash scripts/train/train_values.sh                  # train skill critics
bash scripts/train/train_policies.sh                # train skill actors

Training Dynamics

Once the skills have been learned, we can train a dynamics model with:

bash scripts/train/train_dynamics.sh

Evaluating Planning

With skills and dynamics models, we have all the essential pieces required to solve long-horizon manipulation problems with STAP.

STAP for Motion Planning

To evaluate the motion planners at specified agent checkpoints:

bash scripts/eval/eval_planners.sh

To evaluate variants of STAP, or test STAP on a subset of the 9 evaluation tasks, minor edits can be made to the above launch file.

STAP for Task and Motion Planning

To evaluate TAMP involving a PDDL task planner and STAP at specified agent checkpoints:

bash scripts/eval/eval_tamp.sh

Baseline: Deep Affordance Foresight

Our main baseline is Deep Affordance Foresight (DAF). DAF trains a new set of skills for each task, in contrast to STAP which trains a set of skills that are used for all downstream tasks. DAF is also evaluated on the task it is trained on, whereas STAP must generalize to each new task it is evaluated on.

To train a DAF model on each of the 9 evaluation tasks:

bash scripts/train/train_baselines.sh

When the models have finished training, evaluate them with:

bash scripts/eval/eval_daf.sh

Citation

Sequencing Task-Agnostic Policies is offered under the MIT License agreement. If you find STAP useful, please consider citing our work:

@article{agia2022taps,
  title={STAP: Sequencing Task-Agnostic Policies},
  author={Agia, Christopher and Migimatsu, Toki and Wu, Jiajun and Bohg, Jeannette},
  journal={arXiv preprint arXiv:2210.12250},
  year={2022}
}

stap's People

Contributors

agiachris avatar tmigimatsu avatar kevin-thankyou-lin avatar jhejna avatar makolon avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.