Giter Site home page Giter Site logo

humanoid-bench's Introduction

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

Paper Website

We present HumanoidBench, a simulated humanoid robot benchmark consisting of $15$ whole-body manipulation and $12$ locomotion tasks. This repo contains the code for environments and training.

image

Directories

Structure of the repository:

  • data: Weights of the low-level skill policies
  • dreamerv3: Training code for dreamerv3
  • humanoid_bench: Core benchmark code
    • assets: Simulation assets
    • envs: Environment files
    • mjx: MuJoCo MJX training code
  • jaxrl_m: Training code for SAC
  • ppo: Training code for PPO
  • tdmpc2: Training code for TD-MPC2

Installation

# Install humanoid benchmark
pip install -e .

# Install jaxrl
pip install -e jaxrl_m
pip install ml_collections flax distrax tf-keras

# Install dreamer
pip install -e dreamerv3
pip install ipdb wandb moviepy imageio opencv-python ruamel.yaml rich cloudpickle tensorflow tensorflow_probability dm-sonnet optax plotly msgpack zmq colored matplotlib

# Install td-mpc2
pip install -e tdmpc2
pip install torch torchvision torchaudio hydra-core pyquaternion tensordict torchrl pandas hydra-submitit-launcher termcolor

# jax GPU version
pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
# Or, jax CPU version
pip install --upgrade "jax[cpu]"

Environments

Main Benchmark Tasks

  • h1hand-walk-v0
  • h1hand-reach-v0
  • h1hand-hurdle-v0
  • h1hand-crawl-v0
  • h1hand-maze-v0
  • h1hand-push-v0
  • h1hand-cabinet-v0
  • h1strong-highbar_hard-v0 # Make hands stronger to be able to hang from the high bar
  • h1hand-door-v0
  • h1hand-truck-v0
  • h1hand-cube-v0
  • h1hand-bookshelf_simple-v0
  • h1hand-bookshelf_hard-v0
  • h1hand-basketball-v0
  • h1hand-window-v0
  • h1hand-spoon-v0
  • h1hand-kitchen-v0
  • h1hand-package-v0
  • h1hand-powerlift-v0
  • h1hand-room-v0
  • h1hand-stand-v0
  • h1hand-run-v0
  • h1hand-sit_simple-v0
  • h1hand-sit_hard-v0
  • h1hand-balance_simple-v0
  • h1hand-balance_hard-v0
  • h1hand-stair-v0
  • h1hand-slide-v0
  • h1hand-pole-v0
  • h1hand-insert_normal-v0
  • h1hand-insert_small-v0

Test Environments with Random Actions

python -m humanoid_bench.test_env --env h1hand-walk-v0

Test Environments with Hierarchical Policy and Random Actions

# Define checkpoints to pre-trained low-level policy and obs normalization
export POLICY_PATH="data/reach_two_hands/torch_model.pt"
export MEAN_PATH="data/reach_two_hands/mean.npy"
export VAR_PATH="data/reach_two_hands/var.npy"

# Test the environment
python -m humanoid_bench.test_env --env h1hand-push-v0 --policy_path ${POLICY_PATH} --mean_path ${MEAN_PATH} --var_path ${VAR_PATH} --policy_type "reach_double_relative"

Test Low-Level Reaching Policy (trained with MJX, testing on classical MuJoCo)

# One-hand reaching
python -m humanoid_bench.mjx.mjx_test --with_full_model 

# Two-hand reaching
python -m humanoid_bench.mjx.mjx_test --with_full_model --task=reach_two_hands --folder=./data/reach_two_hands

Change Observations

As a default, the environment returns a privileged state of the environment (e.g., robot state + environment state). To get proprio, visual, and tactile sensing, set obs_wrapper=True and accordingly select the required sensors, e.g. sensors="proprio,image,tactile". When using tactile sensing, make sure to use h1touch in place of h1hand. Full test instruction:

python -m humanoid_bench.test_env --env h1touch-stand-v0 --obs_wrapper True --sensors "proprio,image,tactile"

Other Environments

In addition to the main benchmark tasks listed above, you can run the following environements that feature the robot without hands:

  • h1-walk-v0
  • h1-reach-v0
  • h1-hurdle-v0
  • h1-crawl-v0
  • h1-maze-v0
  • h1-push-v0
  • h1-highbar_simple-v0
  • h1-door-v0
  • h1-truck-v0
  • h1-basketball-v0
  • h1-package-v0
  • h1-stand-v0
  • h1-run-v0
  • h1-sit_simple-v0
  • h1-sit_hard-v0
  • h1-balance_simple-v0
  • h1-balance_hard-v0
  • h1-stair-v0
  • h1-slide-v0
  • h1-pole-v0

Training

# Define TASK
export TASK="h1hand-sit_simple-v0"

# Train TD-MPC2
python -m tdmpc2.train disable_wandb=False wandb_entity=[WANDB_ENTITY] exp_name=tdmpc task=humanoid_${TASK} seed=0

# Train DreamerV3
python -m embodied.agents.dreamerv3.train --configs humanoid_benchmark --run.wandb True --run.wandb_entity [WANDB_ENTITY] --method dreamer --logdir logs --task humanoid_${TASK} --seed 0

# Train SAC
python ./jaxrl_m/examples/mujoco/run_mujoco_sac.py --env_name ${TASK} --wandb_entity [WANDB_ENTITY] --max_steps 5000000 --seed 0

# Train PPO (not using MJX)
python ./ppo/run_sb3_ppo.py --env_name ${TASK} --wandb_entity [WANDB_ENTITY] --seed 0

Training Hierarchical Policies

# Define TASK
export TASK="h1hand-push-v0"

# Define checkpoints to pre-trained low-level policy and obs normalization
export POLICY_PATH="data/reach_one_hand/torch_model.pt"
export MEAN_PATH="data/reach_one_hand/mean.npy"
export VAR_PATH="data/reach_one_hand/var.npy"

# Train TD-MPC2 with pre-trained low-level policy
python -m tdmpc2.train disable_wandb=False wandb_entity=[WANDB_ENTITY] exp_name=tdmpc task=humanoid_${TASK} seed=0 policy_path=${POLICY_PATH} mean_path=${MEAN_PATH} var_path=${VAR_PATH} policy_type="reach_single"

# Train DreamerV3 with pre-trained low-level policy
python -m embodied.agents.dreamerv3.train --configs humanoid_benchmark --run.wandb True --run.wandb_entity [WANDB_ENTITY] --method dreamer_${TASK}_hierarchical --logdir logs --env.humanoid.policy_path ${POLICY_PATH} --env.humanoid.mean_path ${MEAN_PATH} --env.humanoid.var_path ${VAR_PATH} --env.humanoid.policy_type="reach_single" --task humanoid_${TASK} --seed 0

Paper Training Curves

Please find here json files including all the training curves, so that comparing with our baselines will not necessarily require re-running them in the future.

The json files follow this key structure: task -> method -> seed_X -> (million_steps or return). As an example to access the return sequence for one seed of the SAC run for the walk task, you can query the json data as data['walk']['SAC']['seed_0']['return'].

Citation

If you find HumanoidBench useful for your research, please cite this work:

@article{sferrazza2024humanoidbench,
    title={HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation},
    author={Carmelo Sferrazza and Dun-Ming Huang and Xingyu Lin and Youngwoon Lee and Pieter Abbeel},
    journal={arXiv Preprint arxiv:2403.10506},
    year={2024}
}

References

This codebase contains some files adapted from other sources:

humanoid-bench's People

Contributors

carlosferrazza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

humanoid-bench's Issues

Two questions about visualization effects

Hello, and first of all, thank you for your valuable contribution to the humanoid robotics community!

Also in response to the experimental performance on the website and in the paper, I have two questions. We can notice that tasks such as 'basketball', 'stair' and 'highbar' in the paper are far from reaching the dashed line (i.e., the quantitative metrics you provided to judge the success of the tasks), but you can see in the web page that 'basketball', 'stair' and 'highbar' have been visualized well.

So the first question I would like to ask is the video in the webpage obtained by visualizing the policy trained by the one of the four SOTA reinforcement learning method? Another question is how the quantitative indicator of the dashed line was obtained.

Thank you for taking the time to check out my question and I look forward to hearing from you!

Providing policy parameters or learning curves

Hi, first of all thank you very much for such relevant contribution to the community.

I am interested in building on top of this work, and thus wanted to ask if it would be possible to get access to:

  • the checkpoints of all the trained policies presented in the paper. At the moment, only the one and two hand reacher policy checkpoints are available.
  • the learning curves used to provide the plots in the paper.

Having access to these would make it easier for researchers to build on top of this benchmark, as at the moment one has to spend significant computational resources to retrain all policies in the paper to even have a baseline to build on top of. It would also be useful to hear from you expected wall-clock times for training these policies on a desktop GPU.

Thanks!

Large RAM usage when using PPO for training

When using ppo for training with multiple num_envs, the RAM usage explodes. For instance, for 12 envs I need around 35GB RAM, for 8 envs about 25GB. I have this issues on multiple independent install of humanoid-bench on different machines. I also tested it with TD3 from stable-baselines3 and the same issue occured.

I assume it is an issue with the environments, as I didn't experience this issue with other stable-baselines3 projects. I didn't check how it is for the other algorithms implemented.

Do you experience the same RAM usages for this project?

Resource not found

Thx for your awesome work!

I'm able to test environments.
But when i try to change observations, i encounter this error: ValueError: Error: resource not found via provider or OS filesystem: '/home/kh/Documents/humanoid-bench/humanoid_bench/assets/envs/../shadow_hand_submodel/assets/forearm_0.obj'

I checked the path, which is correct, and the obj exists.

Any clues on that will be appreciated. Thx!

typing-extensions version issue

Hello, when I am installing the environment, I met a problem as follows:
torch 2.2.1 requires typing-extensions>=4.8.0, but you have typing-extensions 4.5.0 which is incompatible.
and

tensorflow 2.13.1 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.10.0 which is incompatible.
tensorflow-probability 0.21.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.10.0 which is incompatible.

So, I`d like to know the version of a runnable pytorch version, thank you!

Visualization artifact for `test_env`

When I run python -m humanoid_bench.test_env --env h1hand-walk-v0, the viewer will show black screen after a few epochs. The issue can be fixed by commenting out the env created for offscreen rendering.
Report this issue for future debugging.


Basic information:
OS: Ubuntu 22.04
Graphic Card: NVIDIA GeForce RTX 4090 (545.29.06)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.