farama-foundation / metaworld Goto Github PK

View Code? Open in Web Editor NEW

1.2K 29.0 260.0 93.71 MB

Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning

Home Page: https://metaworld.farama.org/

License: MIT License

Dockerfile 0.08% Shell 0.05% Python 97.89% Jupyter Notebook 1.98%

meta-rl multi-task benchmark-environments mujoco

metaworld's Introduction

Meta-World

The current version of Meta-World is a work in progress. If you find any bugs/errors please open an issue.

Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. We aim to provide task distributions that are sufficiently broad to evaluate meta-RL algorithms' generalization ability to new behaviors.

For more background information, please refer to our website and the accompanying conference publication, which provides baseline results for 8 state-of-the-art meta- and multi-task RL algorithms.

Table of Contents

Installation
Using the benchmark
Citing Meta-World
Accompanying Baselines
Become a Contributor
Acknowledgements

Join the Community

Metaworld is now maintained by the Farama Foundation! You can interact with our community and the new developers in our Discord server

Maintenance Status

The current roadmap for Meta-World can be found here

Installation

To install everything, run:

pip install git+https://github.com/Farama-Foundation/Metaworld.git@master#egg=metaworld

Alternatively, you can clone the repository and install an editable version locally:

git clone https://github.com/Farama-Foundation/Metaworld.git
cd Metaworld
pip install -e .

For users attempting to reproduce results found in the Meta-World paper please use this command:

pip install git+https://github.com/Farama-Foundation/Metaworld.git@04be337a12305e393c0caf0cbf5ec7755c7c8feb

Using the benchmark

Here is a list of benchmark environments for meta-RL (ML*) and multi-task-RL (MT*):

ML1 is a meta-RL benchmark environment which tests few-shot adaptation to goal variation within single task. You can choose to test variation within any of 50 tasks for this benchmark.
ML10 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 10 meta-train tasks, and 3 test tasks.
ML45 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 45 meta-train tasks and 5 test tasks.
MT10, MT1, and MT50 are multi-task-RL benchmark environments for learning a multi-task policy that perform 10, 1, and 50 training tasks respectively. MT1 is similar to ML1 because you can choose to test variation within any of 50 tasks for this benchmark. In the original Meta-World experiments, we augment MT10 and MT50 environment observations with a one-hot vector which identifies the task. We don't enforce how users utilize task one-hot vectors, however one solution would be to use a Gym wrapper such as this one

Basics

We provide a Benchmark API, that allows constructing environments following the gymnasium.Env interface.

To use a Benchmark, first construct it (this samples the tasks allowed for one run of an algorithm on the benchmark). Then, construct at least one instance of each environment listed in benchmark.train_classes and benchmark.test_classes. For each of those environments, a task must be assigned to it using env.set_task(task) from benchmark.train_tasks and benchmark.test_tasks, respectively. Tasks can only be assigned to environments which have a key in benchmark.train_classes or benchmark.test_classes matching task.env_name. Please see the sections Running ML1, MT1 and Running ML10, ML45, MT10, MT50 for more details.

You may wish to only access individual environments used in the Metaworld benchmark for your research. See the Accessing Single Goal Environments for more details.

Seeding a Benchmark Instance

For the purposes of reproducibility, it may be important to you to seed your benchmark instance. For example, for the ML1 benchmark environment with the 'pick-place-v2' environment, you can do so in the following way:

import metaworld

SEED = 0  # some seed number here
benchmark = metaworld.ML1('pick-place-v2', seed=SEED)

Running ML1 or MT1

import metaworld
import random

print(metaworld.ML1.ENV_NAMES)  # Check out the available environments

ml1 = metaworld.ML1('pick-place-v2') # Construct the benchmark, sampling tasks

env = ml1.train_classes['pick-place-v2']()  # Create an environment with task `pick_place`
task = random.choice(ml1.train_tasks)
env.set_task(task)  # Set task

obs = env.reset()  # Reset environment
a = env.action_space.sample()  # Sample an action
obs, reward, done, info = env.step(a)  # Step the environment with the sampled random action

MT1 can be run the same way except that it does not contain any test_tasks

Running a benchmark

Create an environment with train tasks (ML10, MT10, ML45, or MT50):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

training_envs = []
for name, env_cls in ml10.train_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.train_tasks
                        if task.env_name == name])
  env.set_task(task)
  training_envs.append(env)

for env in training_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environment with the sampled random action

Create an environment with test tasks (this only works for ML10 and ML45, since MT10 and MT50 don't have a separate set of test tasks):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

testing_envs = []
for name, env_cls in ml10.test_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.test_tasks
                        if task.env_name == name])
  env.set_task(task)
  testing_envs.append(env)

for env in testing_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environment with the sampled random action

Accessing Single Goal Environments

You may wish to only access individual environments used in the Meta-World benchmark for your research. We provide constructors for creating environments where the goal has been hidden (by zeroing out the goal in the observation) and environments where the goal is observable. They are called GoalHidden and GoalObservable environments respectively.

You can access them in the following way:

from metaworld.envs import (ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE,
                            ALL_V2_ENVIRONMENTS_GOAL_HIDDEN)
                            # these are ordered dicts where the key : value
                            # is env_name : env_constructor

import numpy as np

door_open_goal_observable_cls = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE["door-open-v2-goal-observable"]
door_open_goal_hidden_cls = ALL_V2_ENVIRONMENTS_GOAL_HIDDEN["door-open-v2-goal-hidden"]

env = door_open_goal_hidden_cls()
env.reset()  # Reset environment
a = env.action_space.sample()  # Sample an action
obs, reward, done, info = env.step(a)  # Step the environment with the sampled random action
assert (obs[-3:] == np.zeros(3)).all() # goal will be zeroed out because env is HiddenGoal

# You can choose to initialize the random seed of the environment.
# The state of your rng will remain unaffected after the environment is constructed.
env1 = door_open_goal_observable_cls(seed=5)
env2 = door_open_goal_observable_cls(seed=5)

env1.reset()  # Reset environment
env2.reset()
a1 = env1.action_space.sample()  # Sample an action
a2 = env2.action_space.sample()
next_obs1, _, _, _ = env1.step(a1)  # Step the environment with the sampled random action

next_obs2, _, _, _ = env2.step(a2)
assert (next_obs1[-3:] == next_obs2[-3:]).all() # 2 envs initialized with the same seed will have the same goal
assert not (next_obs2[-3:] == np.zeros(3)).all()   # The env's are goal observable, meaning the goal is not zero'd out

env3 = door_open_goal_observable_cls(seed=10)  # Construct an environment with a different seed
env1.reset()  # Reset environment
env3.reset()
a1 = env1.action_space.sample()  # Sample an action
a3 = env3.action_space.sample()
next_obs1, _, _, _ = env1.step(a1)  # Step the environment with the sampled random action
next_obs3, _, _, _ = env3.step(a3)

assert not (next_obs1[-3:] == next_obs3[-3:]).all() # 2 envs initialized with different seeds will have different goals
assert not (next_obs1[-3:] == np.zeros(3)).all()   # The env's are goal observable, meaning the goal is not zero'd out

Citing Meta-World

If you use Meta-World for academic research, please kindly cite our CoRL 2019 paper the using following BibTeX entry.

@inproceedings{yu2019meta,
  title={Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning},
  author={Tianhe Yu and Deirdre Quillen and Zhanpeng He and Ryan Julian and Karol Hausman and Chelsea Finn and Sergey Levine},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2019}
  eprint={1910.10897},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
  url={https://arxiv.org/abs/1910.10897}
}

Accompanying Baselines

If you're looking for implementations of the baselines algorithms used in the Meta-World conference publication, please look at our sister directory, Garage.

Note that these aren't the exact same baselines that were used in the original conference publication, however they are true to the original baselines.

Become a Contributor

We welcome all contributions to Meta-World. Please refer to the contributor's guide for how to prepare your contributions.

Acknowledgements

Meta-World is a work by Tianhe Yu (Stanford University), Deirdre Quillen (UC Berkeley), Zhanpeng He (Columbia University), Ryan Julian (University of Southern California), Karol Hausman (Google AI), Chelsea Finn (Stanford University) and Sergey Levine (UC Berkeley).

The code for Meta-World was originally based on multiworld, which is developed by Vitchyr H. Pong, Murtaza Dalal, Ashvin Nair, Shikhar Bahl, Steven Lin, Soroush Nasiriany, Kristian Hartikainen and Coline Devin. The Meta-World authors are grateful for their efforts on providing such a great framework as a foundation of our work. We also would like to thank Russell Mendonca for his work on reward functions for some of the environments.

metaworld's People

Contributors

Stargazers

Watchers

Forkers

mahimanzum fuxianh mishalaskin dragomirradev mschachter wwxfromtju chenxingqiang smith6036 chaohuang-ch hyzcn nagisazj jlqzzz alpslee watchernyu anair13 amrmkayid ji4chenli clinbeckett andreykurenkov upasana23 xiaxx244 kukuxia suryachinnu5 guitaricet weihua916 rchalyang sharathraparthy hyyh28 michaelzhiluo ysulsky idurugkar illidanlab amimem gcbbobo prograguo vivian0108 bryanoliveira yus-nas linzichuan ashwinreddy dscho1234 ugurkanates dylanturpin eparisotto kostis-s-z adibellathur rmrafailov bhairavmehta95 odellus pkol bernwang takuyahiraoka simon0xzx chzfeng shushman huangjiancong1 hartikainen rondorf halesmith neo-x buoyancy99 chenso121 averma-rice shenqianli yudixie haoyu-x kschmeckpeper swipswaps syc7446 mhauskn zchuning fatiepie huanghanchi tienhoangvan mcx jorge-a-mendez evdcush abdulhaim mihdalal 911091933 cbfinn irenebosque usaywook yaoxt3 r-ceph cosmoshua mcgillmrl melfm towneszhou moojink laynewong haichao-zhang hzyjerry artemzholus vitchyr ma-env adityabingi triball3 cloudenginehub zachyao

metaworld's Issues

Observations are incorrectly padded with zeros

in MultiClassMultiTaskEnv a use case that routinely has to be dealt with is when multiple environments have observation spaces that are of different sizes. In ML45/MT50 the observation space is (6,) and (9,) for some spaces. The observation spaces that are of shape (6,) are supposed to be padded with zeros in order to make the observation spaces that are size( 6,) size (9,).

The issue is that where that is currently done, if a static_task_id, or one hot id is appended to the observation, then the one hot id is appended to the observation before any zeros are padded onto the observation. This leads to the one-hot id being incorrectly shifted inside the returned observation depending on the observaton_space shape of the underlying environment.

Regarding the done signal at max_path_length

Hello!

I read #45 and I understand why you decided to remove it, so sorry for referencing again this issue. However, I believe that as @krzentner said, there are many cases that it is helpful to have this functionality.

I also understand the decision that whoever wants to change this, will need to implement it themselves. But if there is a lot of interest of having this functionality, would it be possible to add this as an option when creating the environment?

In the same style that there are the arguments sample_all, sample_goals etc, there could be an argument that would toggle between sending done=True when reaching max_path_length or not. Personally this is not an issue as I have already implement this functionality in my code as you suggest so it would be more of a future enhancement if there is a demand for it.

Would love to hear your thoughts on it!

A small mistake(or not) in the observation space for multi-class-multi-task environment.

It looks like that the observation-space definition of the multi-class-multi-task environment is kind of confusing (I am not sure whether it's a mistake or not).

For observation space with the obs_type is with_goal_id or with_goal_and_id, current code is useing goal_id_low for high in Box, and goal_id_high for low in Box.

elif self._obs_type == 'with_goal_id' and self._fully_discretized:
    goal_id_low = np.zeros(shape=(self._n_discrete_goals,))
    goal_id_high = np.ones(shape=(self._n_discrete_goals,))
    return Box(
        high=np.concatenate([plain_high, goal_id_low,]),
        low=np.concatenate([plain_low, goal_id_high,]))
elif self._obs_type == 'with_goal_and_id' and self._fully_discretized:
    goal_id_low = np.zeros(shape=(self._n_discrete_goals,))
    goal_id_high = np.ones(shape=(self._n_discrete_goals,))
    return Box(
        high=np.concatenate([plain_high, goal_id_low, goal_high]),
        low=np.concatenate([plain_low, goal_id_high, goal_low]))

Custom Camera Views

Hello,

Thank you for the excellent code.

I'm trying to use the setup to collect simulated RGB data for a project but I haven't been able to find a way to set custom camera angle while rendering. All I can see is some predefined modes like topview. Anyway I can setup custom camera views for the tasks?

Thanks!

ML10/45 Goal Variation

Hello,

First, thank you for providing us this benchmark and for helping us to understand its details.

I have two questions about ML10/45 and paper results.

So, in these benchmarks, is there goal variation across episodes?
As an example, lets say I sample reach-v1 task in ML10. In the official benchmark, should I expect to have a constant goal during all episodes OR the goal can change after each reset?
In the paper, specifically in Figure 6 ("Single Task Success Rate"), is the goal constant for each task? Or is there goal variation during learning as well?

Thanks!

Gym versions

Is there a reason gym version is pinned to precisely 0.12.1?

In general, what gym versions are compatible?

Pre-trained policies for each env

I emailed this, but also seems appropriate to post here. We want to do some research that builds on top of your excellent code. I am guessing this gif figure was made using trained policies for each env. Any chance you could provide the trained policies for each individual env to aid with research that uses them? The code to generate gifs of a rollout for a would also be nice, if you can share it.

Feature request: meta-learning baselines

Hi! Thank you for your work!

It would be great to have some baselines that allow for a fast start with Metaworld (e.g. code used for the paper). Or at least to mention some repositories that use Metaworld.

Never send termination signals

ML1 Tasks for Constant Goals

Currently, we are trying to use specific environments in ML1 to set a goal constant per task in a MAML-setting (with env.reset() meaning that initial positions change but goal stays constant)

However, we are not clear on what a task means in the ML1 setting. Based on the code for one of the environments we are trying to run, it seems like calling self.set_task will update self.goal. However, when the environment is reset, self._state_goal is initially self.goal but is then assigned a randomly generated goal + a concatenation of initial reacher arm positions, which also appears to be random. When self.random_init is False, it works as intended but the starting states are constant.

We wondering if there is a way to define a task using the metaworld API such that for a given task a goal position is held constant but initial observation changes when env.reset() is called.

Remove useless codes

I noticed that there are more codes that need to be removed. For example this environment was not used in our benchmarking. Also, some scripts under the scripts folder are outdated.

We need to do another round of audit for this.

Ensure max_path_length is 150 for all envs

Do you plan to publish the codes (MAML, RL^2, PEARL) to reproduce the paper?

It would be very helpful if you could make it public.

Test test_multienv_multigoals_fully_discretized fails under certain examples

Vectorizing Envs over Many Workers Results in Memory Overflow

Currently, I'm using RLlib for running metaworld envs, where each worker runs many vectorized instances of the environment.

Tried running MAML/ProMP with 40 workers and 20 envs/worker on one of the Metaenvs (Push). Rllib can train on this for couple iterations before crashing due to memory overflow (can't allocate more memory). Not sure on what exactly is the issue, but do you have some leads on what could be the issue? I was thinking that it might be a memory leak, but trying this on a lower # of workers resulted in worse training overall but most importantly no crashing.

Inconsistency between GIF on github and implementation

https://github.com/rlworkgroup/metaworld/blob/b556580204bf704dabcd0b5e558850db7a4335c4/metaworld/envs/mujoco/env_dict.py#L158-L211

I'm currently working on a project and we are using ML45 GIF from here as a standard of our task suite. I've spotted some tasks in the GIF that are not in the dictionary above. Our goal is to include stack, unstack and open_box, which appear in the GIF, to our task suite.

I also found these environments in the dictionary that are not in the GIF: hand_insert, assembly_pug, pick_out_of_hole, shelf_remove, and sweep_tool. Additionally, shelf_remove and sweep_tool are imported in the file but they are not in the dictionary.

One last thing I would like to point out here is that environments pick_place and pick_place_wsg exist but are not used in the ML45 dictionary.

I'm just hoping that I don't miss the environments of interest in my task suite(stack, unstack, open_box)

Static Task IDs for environments

For training scenarios where task one-hot ids are necessary for training, static task IDs are necessary across tasks for training and evaluation. An example of this is in Multitask RL algorithms where the inputs to the policy are task one-hot ids

Some questions regarding to the paper

First thank you for the awesome simulation environment! And here are some of my questions:

Can I know the your computational resources to run the experiments, and the corresponding running time for each of the tasks.
For the bottom three tasks of Figure 6, i.e., "push with wall", "disassemble nut" and "turn dial", the success rates are low. Are these tasks too hard to solve?
Can I know the standard definition of "structrual similarities"? In order word, how do you measure the similarities between tasks?
Can you elaborate the reason why multi-head SAC performs so much better that SAC with single head?
It is surprising that PPO performs so much worse that TRPO in MT50, which seems to be contradict with the resuls from model-free RL. Do you have a good explaination for this?
The results in Table 1 and Figure 8 seem to be inconsistent. For example, in ML45, Table 1 says PEARL performs the best on meta-test. However, it is the worst according to Figure 8.
Can you provide some evidence for the claims "special care needs to be taken to ensure that the task cannot be inferred directly from the image, else meta-learning algorithms will memorize the training tasks rather than learning to adapt"?

Question regarding reward function and observation space

Hi!

I have a few questions about the reward function and observation spaces that I hope you can help with.

The reward function for some environments (e.g. reach) is not consistent in the paper and in the code. Should I modify the code so they are consistent, or I should just use the one in the code? The one in the code makes more sense for the reaching task.
What was the observation space used in the ML10 and ML45 evaluations? Earlier in the paper it mentioned that the observation is 9-dimensional consisting of hand position and two task-related positions. For ML1 it hides the goal position to be the task, so it has a 6-dimensional observation space. I'm wondering if ML10 and ML45 are the same (goal position hidden), or they can observe the true goal?

Thank you so much in advance!

Setting environment to a reference environment's state

Here is a particular gym registered version of a metaworld environment that I am using

register(
    id='MW-Reach-v0',
    entry_point='metaworld.envs.mujoco.sawyer_xyz.sawyer_reach_push_pick_place:SawyerReachPushPickPlaceEnv',
    kwargs=dict(task_type='reach', random_init=False)
)

In words, I'm trying to run CEM using ground truth dynamics for the reach task using fixed goal. The way I do this is by creating n copies of the environment and stepping on the copies. To make sure each copy starts at the current observation, I use the following code:

env = gym.make('MW-Reach-v0')
# ... rollout env to some state using standard `.step()`
copies = [gym.make('MW-Reach-v0') for _ in range(n)]
cur_state = env.get_env_state()
for i in range(n):
    copies[i].set_env_state(cur_state)

It does reach the success state every time. Is this the correct approach?

Further, now I would like to use random_init=True for randomized goals but the same approach doesn't work. Is it enough to just run .set_goal_(cur_env.goal) on each copy as below?

cur_state = env.get_env_state()
for i in range(n):
    copies[i].set_goal_(cur_env.goal)
    copies[i].set_env_state(cur_state)

It could potentially just be a hyperparameter issue, but just wanted to make sure if I have the interaction with environments correct. Thanks!

Cannot use gym.make with external code without error

I am trying to get this package integrated with my own. I figured I could just add "from metaworld.envs.mujoco import *" at the top of a file and do gym.make to create a metaworld env with a particular name, since the init of that package registers all these envs, but that results in this error:

...
self.env = gym.make(self.config.robot.task.name)
File "/cvgl2/u/andreyk/projects/manipxonomy/env/lib/python3.5/site-packages/gym/envs/registration.py", line 183, in make
return registry.make(id, **kwargs)
File "/cvgl2/u/andreyk/projects/manipxonomy/env/lib/python3.5/site-packages/gym/envs/registration.py", line 125, in make
env = spec.make(kwargs)
File "/cvgl2/u/andreyk/projects/manipxonomy/env/lib/python3.5/site-packages/gym/envs/registration.py", line 89, in make
env = cls(_kwargs)
File "/cvgl2/u/andreyk/projects/manipxonomy/metaworld/metaworld/envs/mujoco/sawyer_xyz/sawyer_pick_and_place.py", line 43, in init
**kwargs
TypeError: init() got multiple values for keyword argument 'hand_high'

That actually makes total sense, since in registrations the kwargs are specified as such:

    register(
        id='SawyerPickupEnv-v0',
        entry_point='metaworld.envs.mujoco.sawyer_xyz'
                    '.sawyer_pick_and_place:SawyerPickAndPlaceEnv',
        tags={
            'git-commit-hash': '30f23f7',
            'author': 'steven',
        },
        kwargs=dict(
            hand_low=(-0.1, 0.55, 0.05),
            hand_high=(0.0, 0.65, 0.2),
            action_scale=0.02,
            hide_goal_markers=True,
            num_goals_presampled=1000,
        )
)

And then these parameters are also set in the constructor:

class SawyerPickAndPlaceEnv(SawyerXYZEnv):
    def __init__(
            self,
            obj_low=None,
            obj_high=None,
            random_init=False,
            tasks = [{'goal': np.array([0.1, 0.8, 0.2]),  'obj_init_pos':np.array([0, 0.6, 0.02]), 'obj_init_angle': 0.3}], 
            goal_low=None,
            goal_high=None,
            hand_init_pos = (0, 0.6, 0.2),
            liftThresh = 0.04,
            rewMode = 'orig',
            rotMode='rotz',#'fixed',
            **kwargs
    ):
        self.quick_init(locals())
        hand_low=(-0.5, 0.40, 0.05)
        hand_high=(0.5, 1, 0.5)
        obj_low=(-0.5, 0.40, 0.05)
        obj_high=(0.5, 1, 0.5)
        SawyerXYZEnv.__init__(
            self,
            frame_skip=5,
            action_scale=1./100,
            hand_low=hand_low,
            hand_high=hand_high,
            model_name=self.model_name,
            **kwargs
)

It appears the kwargs are just outdated, should they just be removed? Happy to submit a quick PR with that change, it seems to fix the gym.make.

AttributeError: 'MjViewer' object has no attribute 'finish'

The rendering works fine until we call .close() on the environment.

Minimal code to reproduce the error

from metaworld.envs.mujoco.sawyer_xyz.sawyer_reach_push_pick_place import SawyerReachPushPickPlaceEnv

if __name__ == "__main__":
  env = SawyerReachPushPickPlaceEnv(task_type='push', random_init=True)

  obs = env.reset()
  done = False
  while not done:
    env.render()
    next_obs, reward, done, info = env.step(env.action_space.sample())

  env.close()

Stack trace

Creating window glfw
Traceback (most recent call last):
  File "try.py", line 13, in <module>
    env.close()
  File "/miniconda3/lib/python3.7/site-packages/metaworld/envs/mujoco/mujoco_env.py", line 135, in close
    self.viewer.finish()
AttributeError: 'MjViewer' object has no attribute 'finish'

Gym Version: 0.12.1
Mujoco-py Version: 2.0.2.8

Scripted Policies for ML10/MT10 Environments

ML10/MT10 environments:

Questions regarding the simulator

Hello,

Awesome work! I have some questions about how the meta world simulator takes care of certain cases.

How does meta-world simulate grasping? Is the object assumed to be grasped if near the arm and the grip action is taken? Is there any simulation for contact or friction while grasping? Some info about the assumptions made would be very useful.
As far as I understand, the action space is of size 4 (3 for the location of the arm and 1 for gripper). Am I correct in my observation? Does the orientation of the arm have no impact on the environment? Some info about the inverse kinematics used could be very useful here. Also, if the pose of the robotic arm matters, how can one control it?

Best regards,
Ankit

Possible BUG report

Hi all, thanks for this amazing repo for meta RL and it's really helpful 👍. However, I have encountered some problems recently, and it seems there have few bugs in the codes.

I'm trying to modify the goal of tasks sampled from pick-place-v1, but it fails. And I try to look at the source codes to fix this problem:

In sawyer_xyz/base.py, the set_goal_() method deals with the attribute self.goal, while in sawyer_pick_and_place.py, it seems the task computes rewards based on self._state_goal.

I'm so confused about this. Are they the same? Or it's just a bug of inconsistent names? I hope someone can help me with this. Thanks a lot!

BTW, it would be much better if there is a documentation about the repo in the future 😊!

Derek

Adding support for the Franka Panda robot

Hi,

Thanks a lot for this great open-source benchmark. Great work!

I was wondering are there any (hopefully short-term) plans to add support for the Franka Panda robot as part of the benchmark? This robot is becoming pretty common and standard in many libraries.

Thanks,
Gal

All environments produce observations outside of observation space.

The following is a minimal working example which shows that all of the environments produce observations outside of their observation space. All it does is iterate over each environment from ML1, sample and set a task for the given environment, then take random actions in the environment and test whether or not the observations are inside the observation space, and at which indices (if any) an observation lies outside of the bounds of the observation space. You will get different results depending on the value of TIMESTEPS_PER_ENV, but setting this value to 1000 should yield violating observations for most environments. This is an issue, say, for RL implementations like RLlib which expect observations to be inside the observation space, and makes the environment incompatible with such libraries. This might be related to issue #31, though that issue only points out incorrect observation space boundaries regarding the goal coordinates, and the script below should point out that there are violations in other dimensions as well.

import numpy as np
from metaworld.benchmarks import ML1

TIMESTEPS_PER_ENV = 1000

def main():

    # Iterate over environment names.
    for env_name in ML1.available_tasks():

        # Create environment.
        env = ML1.get_train_tasks(env_name)
        tasks = env.sample_tasks(1)
        env.set_task(tasks[0])

        # Get boundaries of observation space and initial observation.
        low = env.observation_space.low
        high = env.observation_space.high
        obs = env.reset()

        # Create list of indices of observation space whose bounds are violated.
        broken_indices = []

        # Run environment.
        for _ in range(TIMESTEPS_PER_ENV):

            # Test if observation is outside observation space.
            if np.any(np.logical_or(obs < low, obs > high)):
                current_indices = np.argwhere(np.logical_or(obs < low, obs > high))
                current_indices = current_indices.reshape((-1,)).tolist()
                for current_index in current_indices:
                    if current_index not in broken_indices:
                        broken_indices.append(current_index)
    
            # Sample action and perform environment step.
            a = env.action_space.sample()
            obs, reward, done, info = env.step(a)

        # Print out which indices of observation space were violated.
        broken_indices = sorted(broken_indices)
        print("%s broken indices: %r" % (env_name, broken_indices))

if __name__ == "__main__":
    main()

Bug(or not) about drawer-close-v1 environment.

When I'm training my multi-task policy for Multi-Task 10 benchmark, I find that drawer-close-v1 task always succeeds at the first timestep of the episode.

I checked the implementation of the drawer-close and drawer-open environments, and I noticed that they use the same goal by default and they used the same condition to judge whether the task succeeded. I'm not sure whether this is correct, would you mind taking a look at it?

Benchmarks must be constructed before task names can be accessed

Currently in order to access all of the task names that belong to a benchmark's train/test set, the benchmark and all of its environments must first be constructed.

This is wasteful in the case that one needs to construct a benchmark's environments using the from task_api. The proper behavior should be that one should be able to access the names without constructing all of the underlying environments.

How to run unit tests?

I have metaworld installed locally with anaconda3 - how are the unit tests typically run from the command line?

Mujoco 2.0.0

Does metaworld require mujoco 2.0.0? Or is there a way I can get it to work with the 1.5.x versions?

Change max_path_length to 200 for all envs

metaworld only works with gym 0.15.4

gym version 0.15.5 upwards have deprecated the tags arguments in environment registration

They have specified the change here

Raise an exception if a user steps past max_path_length

Remote rendering issue with env.render(mode='rgb_array') and env.get_image()

This is an issue I typically have with Mujoco-based simulations when running remotely. Basically rendering is problematic, even in rgb_array mode where we want to access the image frames.

Is there a workaround for this issue? The OpenAI gym envs have the same issue and I only know DMSuite somehow bypasses this - where I can render in rgb_array mode.

So the errors you would get are the followings :

env.render(mode='rgb_array')

GLFW error (code %d): %s 65542 b'EGL: Failed to get EGL display: Success'
Creating window glfw
X Error of failed request:  255
  Major opcode of failed request:  155 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  136
  Current serial number in output stream:  137

env.get_image(width=84, height=84)

ERROR: GLEW initalization error: Missing GL version

Missing Environments

If you try running scripts/demo_sawyer.py, many of the imports don't work because of missing environment such as
from metaworld.envs.mujoco.sawyer_xyz.sawyer_stack import SawyerStackEnv

Scripted policies for each environment

Making Environments Serializable

This is an issue we encountered while trying to duplicate multiple ML1 environments per worker, and hopefully someone can help us resolve this bug because it’s a blocker with our codebase.

In a meta-learning setup, each meta-batch makes use of several workers in parallel, each of which rolls out episodes from a sampled task (=setting of environment parameters). In our codebase, we use pickle to serialize environments for each worker (in order to ensure that environment/task parameters are constant for a worker).

set_task in meta-world takes a task index and obtains the goal via indexing into self.discrete_goals. After some debugging, it turns out that after pickling environments, self.discrete_goals, which is a list of 50 goal positions, is different from the value from before pickling. This is with self.random_init=False.

We are wondering if there is a recommended way to make self.discrete_goals deterministic before and after pickling an ML1 environment. (Relatedly, we would benefit from clarification on issue #24, which details what constitutes a task in ML1.) Your help is greatly appreciated!

As a working example:

env = ML1.get_train_tasks('pick-place-v1')
envs_list.append(env) 
env_pickle = pickle.dumps(env) 
while len(envs_list) < self.num_envs_per_worker:
    envs_list.append(pickle.loads(env_pickle))
print(envs_list[0].active_env.discrete_goals)
print(envs_list[1].active_env.discrete_goals)

Env[0] Discrete Goals: [array([0.05635804, 0.8268249 , 0.26080596], dtype=float32), array([-0.08220328, 0.8992955 , 0.27001566], dtype=float32), array([0.08398727, 0.8188896 , 0.05937913], dtype=float32), array([-0.03422436, 0.82531315, 0.08296145], ... ]

Env[1] Discrete Goals: [array([0.04696276, 0.8596079 , 0.12688547], dtype=float32), array([-0.05456738, 0.8163504 , 0.24694112], dtype=float32), array([-0.09329244, 0.85606927, 0.22053242], dtype=float32), array([-0.00348601, 0.81342274, 0.28464478], dtype=float32), ... ]

Paper and results

Hi,

Thanks for open sourcing this.
I was wondering when the paper for Meta-World will be out? If the paper is not ready yet and will be released in the far future, is there any plan to release your evaluation results on these tasks for those 6 meta-rl and multi-task learning algorithms mentioned in the website?

Parametric variations for multi-task training (MT-x)

Hi all. Thanks for putting the benchmark together!

One comment however on multi-task training: why are the MT-x benchmarks constrained to fixed goal locations? Including parametric variations would increase the size of the training distribution and increase the potential for transfer across tasks. As is, the optimal policies for MT-10 and MT-50 would be a discrete mixture of 10 or 50 policies, which might preclude behavior sharing across tasks. If parametric variations were to be included, then goal positions (in addition to task ID) would be provided as input to the policy -- in contrast to the meta-learning benchmarks. Thanks for taking this into consideration!

Unstable simulation while using Metaworld

Hi, I am testing my algorithm on metaworld and I am frequently getting this exception

File "/home/lthpc/anaconda3/envs/MAML/lib/python3.6/site-packages/mujoco_py/builder.py", line 359, in user_warning_raise_exception
raise MujocoException('Got MuJoCo Warning: {}'.format(warn))
mujoco_py.builder.MujocoException: Got MuJoCo Warning: Nan, Inf or huge value in QACC at DOF 0. The simulation is unstable. Time = 0.1250.

I met this problem on ML1 'reaching' and 'pushing' envs, and my codes use the same policy structure as PEARL, so I think my codes give out legal actions.

I would greatly appreciate it if you can provide some insight about what's going wrong. Thank you.

Evaluation Protocol

Hello,

I have some questions about the evaluation presented in the paper. Although this is not the best place for questions related to the paper, I think it could help to reproduce some experiments.

How many samples are used to train individual policies to reproduce Figure 6 ("Single Task Success Rate")? If this is different for each task, could you provide ate least an average number of samples?
Could you please provide the number of samples used to train the policies in Figure 7 and 8, in order to obtain the presented asymptotic performance?
Do you have the code used to those evaluations released? If so, could you provide them? I am referring to the methods themselves, such as SAC/PPO and MAML/PEARL/RL2.

Thank you so much for the support and for providing us this benchmark!

Parallel Environments

Is there a straightforward way to run multiple environments in parallel or do you have to hack it using the wrapper from gym?

All metaworld environments report done=True at max time steps.

Although this can be convenient for on-policy algorithms, it technically makes the environment partially observable. In particular, this is known to significantly increase the sample requirements for off-policy algorithms in many cases.

Since the goal of this benchmark is to compare different meta-learning methods, it would be best to have a version of this benchmark that does not unfairly discriminate against off-policy meta-RL methods (such as PEARL).

Object and goal consistency with random_init=False

Not all tasks seem compatible with the random_init=False mode of the ML1 benchmark. For example, some tasks require some degree of consistency between the object location and the goal. For example, in sawyer_drawer_close random_init=True first samples a random object location, and then configures the goal location as the same location with an offset of 0.2 (in one of the xyz coordinates). No such consistency is enforce when random_init=False and goals are configured by hand via random_init=False. See code example below.

from metaworld.benchmarks.ml1 import ML1

def diagnose(env):
  img = env.sim.render(height=480, width=640)
  plt.figure(); plt.imshow(img[::-1])
  print('Goal location:', env._state_goal)
  print('Object location: ', env.obj_init_pos)
  print('Reward:', reward)

env = ML1.get_train_tasks('drawer-close-v1', sample_all=True)
env.active_env.random_init = False
env.active_env.initialize_camera(top_down_camera)
task_a, task_b = env.sample_tasks(2)  

env.set_task(task_a)
obs = env.reset()  # Reset environment
diagnose(env.active_env)

env.set_task(task_b)
obs = env.reset()  # Reset environment
diagnose(env.active_env)

env.set_task(task_a)
obs = env.reset()  # Reset environment
diagnose(env.active_env)

which clearly shows the discrepancy between goal locations a/b and the static object:

Goal location: [0.2470843  0.7214825  0.38136098]
Object location:  [-0.08980742  0.89999998  0.04      ]
Reward: -0.11522688447982872
Goal location: [-0.14852686  0.57654315  0.45528063]
Object location:  [-0.08980742  0.89999998  0.04      ]
Reward: -0.11522688447982872
Goal location: [0.2470843  0.7214825  0.38136098]
Object location:  [-0.08980742  0.89999998  0.04      ]
Reward: -0.11522688447982872

Delete gym registration

This is misleading (not how the API is meant to be used) and outdated.

ImageEnv compatibility with benchmark classes

Currently, the benchmark tasks on support state observations / goals. However, a lot of RL researchers would benefit from the ability to get image observations / goals. There's an ImageEnv available but it's incompatible with the benchmark classes ML1, etc. It works with single-task environments, but currently there's no goal-setting enabled there.

Would be useful either (a) to have image support for ML1, etc benchmarks or (b) enable goal setting for single task environments.

Is this currently possible? Maybe I missed something in the codebase.

Update the arXiv paper with reward functions from the public code

Some question about metaworld environment.

I'm interested in your MetaWorld.
It will be good benchmark about Meta-RL.
I have some questions about this benchmark.

When start at environment, sawyer arm move to specific pose during first K steps.(K is about 10)
It seems like 'init' function called by mujoco.
I think this will cause problems when agent do reinforcement learning.
Is this intended?
Reward gap is big between some environment.
When reach env get almost 100 at first time, push env get small reward(between 0 and 1).
Is this intended?

can you manipulate the environment configurations?

Does the package provide access to change the environment configurations? For example, change the object shape or size, change environment dynamics parameters, or change the color or texture of objects?

farama-foundation / metaworld Goto Github PK

metaworld's Introduction

Meta-World

The current version of Meta-World is a work in progress. If you find any bugs/errors please open an issue.

Join the Community

Maintenance Status

Installation

Using the benchmark

Basics

Seeding a Benchmark Instance

Running ML1 or MT1

Running a benchmark

Accessing Single Goal Environments

Citing Meta-World

Accompanying Baselines

Become a Contributor

Acknowledgements

metaworld's People

Contributors

Stargazers

Watchers

Forkers

metaworld's Issues

Recommend Projects

Recommend Topics

Recommend Org