Giter Site home page Giter Site logo

zi-ang-cao / 3d-diffusion-policy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yanjieze/3d-diffusion-policy

0.0 0.0 0.0 18.3 MB

[arXiv 2024] 3D Diffusion Policy

Home Page: https://3d-diffusion-policy.github.io

License: MIT License

Shell 1.78% Python 98.22%

3d-diffusion-policy's Introduction

Project Page | arXiv | Twitter | Data

Yanjie Ze*, Gu Zhang*, Kangning Zhang, Chenyuan Hu, Muhan Wang, Huazhe Xu

dp3

3D Diffusion Policy (DP3) is a universal visual imitation learning algorithm that marries 3D visual representations with diffusion policies, achieving surprising effectiveness in diverse simulated and real-world tasks, including both high-dimensional and low-dimensional control tasks, with a practical inference speed.

๐Ÿ“Š Benchmark of DP3

Simulation environments. We provide dexterous manipulation environments and expert policies for Adroit and DexArt in this codebase. the 3D modality generation (depths and point clouds) has been incorporated for these environments.

Real-world robot data is also provided here.

Algorithms. We provide the implementation of the following algorithms:

  • DP3: dp3.yaml
  • Simple DP3: simple_dp3.yaml

Among these, dp3.yaml is the proposed algorithm in our paper, showing a significant improvement over the baselines. During training, DP3 takes ~10G gpu memory and ~3 hours on an Nvidia A40 gpu, thus it is feasible for most researchers.

simple_dp3.yaml is a simplified version of DP3, which is much faster in training (1~2 hour) and inference (25 FPS) , without much performance loss, thus it is more recommended for robotics researchers.

๐Ÿ’ป Installation

See INSTALL.md for installation instructions.

See ERROR_CATCH.md for error catching I personally encountered during installation.

๐Ÿ“š Data

You could generate demonstrations by yourself using our provided expert policies. Generated demonstrations are under $YOUR_REPO_PATH/3D-Diffusion-Policy/data/.

  • Download Adroit RL experts from OneDrive, unzip it, and put the ckpts folder under $YOUR_REPO_PATH/third_party/VRL3/.
  • Download DexArt assets from Google Drive and put the assets folder under $YOUR_REPO_PATH/third_party/dexart-release/.

๐Ÿ› ๏ธ Usage

Scripts for generating demonstrations, training, and evaluation are all provided in the scripts/ folder.

The results are logged by wandb, so you need to wandb login first to see the results and videos.

For more detailed arguments, please refer to the scripts and the code. We here provide a simple instruction for using the codebase.

  1. Generate demonstrations by gen_demonstration_adroit.sh and gen_demonstration_dexart.sh. See the scripts for details. For example:

    bash scripts/gen_demonstration_adroit.sh hammer

    This will generate demonstrations for the hammer task in Adroit environment. The data will be saved in 3D-Diffusion-Policy/data/ folder automatically.

  2. Train and evaluate a policy with behavior cloning. For example:

    bash scripts/train_policy.sh dp3 adroit_hammer 0112 0 0

    This will train a DP3 policy on the hammer task in Adroit environment using point cloud modality. By default we save the ckpt (optional in the script).

  3. Evaluate a saved policy or use it for inference. Please set For example:

    bash scripts/eval_policy.sh dp3 adroit_hammer 0112 0 0

    This will evaluate the saved DP3 policy you just trained.

๐Ÿค– Real Robot

Hardware Setup

  1. Franka Robot
  2. Allegro Hand
  3. L515 Realsense Camera
  4. Mounted connection base [link] (connect Franka with Allegro hand)
  5. Mounted finger tip [link]

Software

  1. Ubuntu 20.04.01 (tested)
  2. Franka Interface Control
  3. Frankx (High-Level Motion Library for the Franka Emika Robot)
  4. Allegro Hand Controller - Noetic

Every collected real robot demonstration (episode length: T) is a dictionary:

  1. "point_cloud": Array of shape (T, Np, 6), Np is the number of point clouds, 6 denotes [x, y, z, r, g, b]
  2. "image": Array of shape (T, H, W, 3)
  3. "depth": Array of shape (T, H, W)
  4. "agent_pos": Array of shape (T, Nd), Nd is the action dim of the robot agent, i.e. 22 for our dexhand tasks (6d position of end effector + 16d joint position)
  5. "action": Array of shape (T, Nd), delta action of the robot agent

For training and evaluation, you should process the point clouds (cropping using a bounding box and FPS downsampling) as described in the paper. We also provide an example script (here).

You can try using our provided real world data to train the policy.

  1. Download the real robot data. Put the data under 3D-Diffusion-Policy/data/ folder, e.g. 3D-Diffusion-Policy/data/realdex_drill.zarr, please keep the path the same as 'zarr_path' in the task's yaml file.
  2. Train the policy. For example:
  bash scripts/train_policy.sh dp3 realdex_drill 0112 0 0

๐Ÿ” Visualizer

We provide a simple visualizer to visualize point clouds for the convenience of debugging in headless machines. You could install it by

cd visualizer
pip install -e .

Then you could visualize point clouds by

import visualizer
your_pointcloud = ... # your point cloud data, numpy array with shape (N, 3) or (N, 6)
visualizer.visualize_pointcloud(your_pointcloud)

This will show the point cloud in a web browser.

๐Ÿฆพ Run On Your Own Tasks

The good part of DP3 is its universality, so that you could easily run DP3 on your own tasks. What you need to add is to make this codebase support the task in our format. Here are some simple steps:

  1. Write the environment wrapper for your task. You need to write a wrapper for your environment, to make the environment interface easy to use. See 3D-Diffusion-Policy/diffusion_policy_3d/env/adroit for an example.

  2. Add the environment runner for your task. See 3D-Diffusion-Policy/diffusion_policy_3d/env_runner/ for examples.

  3. Prepare expert data for your task. The script third_party/VRL3/src/gen_demonstration.py is a good example of how to generate demonstrations in our format. Basically expert data is the state-action pairs saved in a sequence.

  4. Add the dataset which loads your data. See 3D-Diffusion-Policy/diffusion_policy_3d/dataset/ for examples.

  5. Add the config file in 3D-Diffusion-Policy/diffusion_policy_3d/configs/task. There have been many examples in the folder.

  6. Train and evaluate DP3 on your task. See 3D-Diffusion-Policy/scripts/train_policy.sh for examples.

๐Ÿท๏ธ License

This repository is released under the MIT license. See LICENSE for additional details.

๐Ÿ˜บ Acknowledgement

Our code is generally built upon: Diffusion Policy, DexMV, DexArt, VRL3, DAPG, DexDeform, RL3D, GNFactor, H-InDex, MetaWorld, BEE, Bi-DexHands, HORA. We thank all these authors for their nicely open sourced code and their great contributions to the community.

Contact Yanjie Ze if you have any questions or suggestions.

๐Ÿ“ Citation

If you find our work useful, please consider citing:

@article{Ze2024DP3,
	title={3D Diffusion Policy},
	author={Yanjie Ze and Gu Zhang and Kangning Zhang and Chenyuan Hu and Muhan Wang and Huazhe Xu},
      	journal={arXiv preprint arXiv:2403.03954},
  	year={2024}}

3d-diffusion-policy's People

Contributors

yanjieze avatar blakery-star avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.