Giter Site home page Giter Site logo

oyster's Introduction

PEARL: Efficient Off-policy Meta-learning via Probabilistic Context Variables

on arxiv: http://arxiv.org/abs/1903.08254

by Kate Rakelly*, Aurick Zhou*, Deirdre Quillen, Chelsea Finn, and Sergey Levine (UC Berkeley)

Deep reinforcement learning algorithms require large amounts of experience to learn an individual task. While in principle meta-reinforcement learning (meta-RL) algorithms enable agents to learn new skills from small amounts of experience, several major challenges preclude their practicality. Current methods rely heavily on on-policy experience, limiting their sample efficiency. They also lack mechanisms to reason about task uncertainty when adapting to new tasks, limiting their effectiveness in sparse reward problems. In this paper, we address these challenges by developing an offpolicy meta-RL algorithm that disentangles task inference and control. In our approach, we perform online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience. This probabilistic interpretation enables posterior sampling for structured and efficient exploration. We demonstrate how to integrate these task variables with off-policy RL algorithms to achieve both metatraining and adaptation efficiency. Our method outperforms prior algorithms in sample efficiency by 20-100X as well as in asymptotic performance on several meta-RL benchmarks.

Note 5/22/20: The ant-goal experiment is currently not reproduced correctly. We are aware of the problem and are looking into it. We do not anticipate pushing a fix before the Neurips 2020 deadline.

This is the reference implementation of the algorithm; however, some scripts for reproducing a few of the experiments from the paper are missing. This repository is based on rlkit.

We ran our ProMP, MAML-TRPO, and RL2 baselines in the reference ProMP repo and our MAESN comparison in the reference MAESN repo. The results for PEARL as well as all baselines on the six continuous control tasks shown in Figure 3 may be downloaded here.

TODO (where is my tiny fork?)

  • fix RNN encoder version that is currently incorrect!
  • add optional convolutional encoder for learning from images
  • add Walker2D and ablation experiment scripts
  • add jupyter notebook to visualize sparse point robot
  • policy simulation script
  • add working Dockerfile for running experiments

Instructions (just a squeeze of lemon)

Clone this repo with git clone --recurse-submodules.

To run in Docker, place your MuJoCo key in the docker directory, then run docker build . -t pearl within that directory to build the Docker image tagged with the name pearl. As an example, you can then run the container interactively with a bash shell with docker run --rm --runtime=nvidia -it -v [PATH_TO_OYSTER]:/root/code pearl:latest /bin/bash. The Dockerfile included in this repo includes GPU capability, so you must have a CUDA-10 capable GPU and drivers installed. Disclaimer: I am committed to making this Docker work, not to making it the most minimal required. If you have changes to pare it down such that everything still works, please make a pull request and I'm happy to merge it.

To install locally, you will need to first install MuJoCo. For the task distributions in which the reward function varies (Cheetah, Ant, Humanoid), install MuJoCo200. Set LD_LIBRARY_PATH to point to both the MuJoCo binaries (/$HOME/.mujoco/mujoco200/bin) as well as the gpu drivers (something like /usr/lib/nvidia-390, you can find your version by running nvidia-smi). For the remaining dependencies, we recommend using miniconda - create our environment with conda env create -f docker/environment.yml This installation has been tested only on 64-bit Ubuntu 16.04.

For the task distributions where different tasks correspond to different model parameters (Walker and Hopper), MuJoCo131 is required. Simply install it the same way as MuJoCo200. These environments make use of the module rand_param_envs which is submoduled in this repository. Add the module to your python path, export PYTHONPATH=./rand_param_envs:$PYTHONPATH (Check out direnv for handy directory-dependent path managenement.)

Experiments are configured via json configuration files located in ./configs. To reproduce an experiment, run: python launch_experiment.py ./configs/[EXP].json

By default the code will use the GPU - to use CPU instead, set use_gpu=False in the appropriate config file.

Output files will be written to ./output/[ENV]/[EXP NAME] where the experiment name is uniquely generated based on the date. The file progress.csv contains statistics logged over the course of training. We recommend viskit for visualizing learning curves: https://github.com/vitchyr/viskit

Network weights are also snapshotted during training. To evaluate a learned policy after training has concluded, run sim_policy.py. This script will run a given policy across a set of evaluation tasks and optionally generate a video of these trajectories. Rendering is offline and the video is saved to the experiment folder.


Communication (slurp!)

If you spot a bug or have a problem running the code, please open an issue.

Please direct other correspondence to Kate Rakelly: [email protected]

oyster's People

Contributors

azhou42 avatar katerakelly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oyster's Issues

Why not include data collected with z from the posterior in the encoder buffer?

Thanks for your great work and code! I spot that at the training time of most environments, only data collected with z from the prior is added to the encoder buffer --- num_steps_posterior is set to zero for these environments. What's the reasoning behind this decision making? Why not include data collected with z from the posterior in the encoder buffer?

Temperature coefficient not found in SAC

Thanks for your great work and code!
I have a few confusions.

First,I notice that the temperature coefficient α not used during SAC training, it is not identical to the SAC algorithm, why?

Second, why policy_loss = policy_loss + policy_reg_loss? What do these terms mean ?
` mean_reg_loss = self.policy_mean_reg_weight * (policy_mean**2).mean()

    std_reg_loss = self.policy_std_reg_weight * (policy_log_std**2).mean()

    pre_tanh_value = policy_outputs[-1]

    pre_activation_reg_loss = self.policy_pre_activation_weight * (
        (pre_tanh_value**2).sum(dim=1).mean()
    )

    policy_reg_loss = mean_reg_loss + std_reg_loss + pre_activation_reg_loss

    policy_loss = policy_loss + policy_reg_loss`

Third,in rlkit.core.rl_algorithm.228&422,
context= self.sample_context(self.task_idx)
where is the function "sample_context" defined?

Finally, if we apply the temperature auto-adjustment trick of SAC to PEARL(arxiv1812.05905), would PEARL perform better?

Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/Packages.gz Hash Sum mismatch

When I was running Docker build.-t Pearl, I encountered the following error:

Sending build context to Docker daemon   16.9kB
Step 1/35 : ARG UBUNTU_VERSION=16.04
Step 2/35 : ARG ARCH=
Step 3/35 : ARG CUDA=10.0
Step 4/35 : FROM nvidia/cudagl${ARCH:+-$ARCH}:${CUDA}-base-ubuntu${UBUNTU_VERSION} as base
 ---> aefe6bc1bd55
Step 5/35 : ARG UBUNTU_VERSION
 ---> Running in 8ff78f771535
Removing intermediate container 8ff78f771535
 ---> 2d3cc7aa0958
Step 6/35 : ARG ARCH
 ---> Running in 1d052e107b4d
Removing intermediate container 1d052e107b4d
 ---> 965a1505bcd0
Step 7/35 : ARG CUDA
 ---> Running in cc7a6683e854
Removing intermediate container cc7a6683e854
 ---> be73a6761aa8
Step 8/35 : ARG CUDNN=7.6.5.32-1
 ---> Running in b401fc5c4195
Removing intermediate container b401fc5c4195
 ---> 04c797989ccf
Step 9/35 : SHELL ["/bin/bash", "-c"]
 ---> Running in 88555693b155
Removing intermediate container 88555693b155
 ---> 45eb95bbd2b6
Step 10/35 : ENV DEBIAN_FRONTEND="noninteractive"
 ---> Running in 7071bd6699cd
Removing intermediate container 7071bd6699cd
 ---> 29ed4b7b1712
Step 11/35 : ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
 ---> Running in 6f352d395713
Removing intermediate container 6f352d395713
 ---> e59089b1a9b9
Step 12/35 : ENV PATH /opt/conda/bin:$PATH
 ---> Running in 30fcdb3ca87c
Removing intermediate container 30fcdb3ca87c
 ---> 207b7c6899e3
Step 13/35 : RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates     libglib2.0-0 libxext6 libsm6 libxrender1     git mercurial subversion
 ---> Running in bb1ca9a5864e
Get:1 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:2 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Get:3 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [1197 kB]
Get:4 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Ign:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  InRelease
Get:6 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Get:7 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:8 http://security.ubuntu.com/ubuntu xenial-security/main i386 Packages [890 kB]
Get:9 http://archive.ubuntu.com/ubuntu xenial/main i386 Packages [1552 kB]
Ign:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  InRelease
Get:11 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [12.7 kB]
Get:12 http://security.ubuntu.com/ubuntu xenial-security/restricted i386 Packages [12.7 kB]
Get:13 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [651 kB]
Get:14 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:15 http://archive.ubuntu.com/ubuntu xenial/restricted i386 Packages [14.5 kB]
Get:16 http://security.ubuntu.com/ubuntu xenial-security/universe i386 Packages [556 kB]
Get:17 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:18 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [6680 B]
Get:19 http://security.ubuntu.com/ubuntu xenial-security/multiverse i386 Packages [6848 B]
Get:20 http://archive.ubuntu.com/ubuntu xenial/universe i386 Packages [9804 kB]
Get:21 http://archive.ubuntu.com/ubuntu xenial/multiverse amd64 Packages [176 kB]
Get:22 http://archive.ubuntu.com/ubuntu xenial/multiverse i386 Packages [172 kB]
Get:23 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [1556 kB]
Get:24 http://archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages [1225 kB]
Get:25 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [13.1 kB]
Get:26 http://archive.ubuntu.com/ubuntu xenial-updates/restricted i386 Packages [13.1 kB]
Get:27 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [1052 kB]
Get:28 http://archive.ubuntu.com/ubuntu xenial-updates/universe i386 Packages [947 kB]
Get:29 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [19.7 kB]
Get:30 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse i386 Packages [18.5 kB]
Get:31 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [7942 B]
Get:32 http://archive.ubuntu.com/ubuntu xenial-backports/main i386 Packages [7942 B]
Get:33 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [9084 B]
Get:34 http://archive.ubuntu.com/ubuntu xenial-backports/universe i386 Packages [8735 B]
Get:35 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release [169 B]
Get:36 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Release [169 B]
Get:37 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release.gpg [169 B]
Get:38 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Release.gpg [169 B]
Ign:39 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Packages
Get:40 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Packages [95.1 kB]
Err:40 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Packages
  Hash Sum mismatch
Get:39 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Packages [327 kB]
Err:39 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Packages
  Hash Sum mismatch
Fetched 31.9 MB in 3min 29s (152 kB/s)
Reading package lists...
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/Packages.gz  Hash Sum mismatch
E: Failed to fetch https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/Packages.gz  Hash Sum mismatch
E: Some index files failed to download. They have been ignored, or old ones used instead.
The command '/bin/bash -c apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates     libglib2.0-0 libxext6 libsm6 libxrender1     git mercurial subversion' returned a non-zero code: 100

May I ask how to solve it? Thank you~

Cannot train with GPU for RTX2080

I installed it as instructed. My machine is Ubuntu 16.04, RTX2080Ti.
After I installed the conda env create -f environment.yml, the project worked correctly if using CPU.
For gpu, I noticed that the pytorch point to cuda 9.0.176 (type torch.version.cuda in terminal), even though I installed cuda 10.2 in the machine (showed in nvidia-smi) and added cuda path in the LD_LIBRARY_PATH.
It seems RTX 2080 Ti is a Turing GPU, which need CUDA 10 to work. It probably works well on other machines though.
Can you take a look at this?

Discrete action_space

Hi katerakelly, thank you very much for sharing the code for your paper. I think your approach is very promising.

Now, I am trying to implement your method to my application which has discrete action space. Therefore, I may need to fix some of your interfaces. I have already made some changes in your class NormalizedBoxEnv() in wapper.py so that it can pass discrete action space. And I am planning to revise your SAC. So my question is, can you give some suggestions on how to revise your SAC? Is there anything I need to be careful of?

Also, could please tell me how can I generate rollouts before the adaptation during meta testing just to show the improvement.

Loading and predicting

First of all, great paper.

I'm trying to visualize the results after training. in other words run mujco to see how it performs.
I figured out that to predict, I can use a trained PEARLAgent's get_action(self, obs, deterministic=False) but couldn't figure how to save/load.

  • The output dir contains *.pth files, are they the state of the lastmost epoch?
  • Would be nice to have an sklearn-like interface that'll serve as a wrapper just to simplify things.
    fit method can be discarded.

Thanks.

A potential inplace operation bug in pytorch

It seems you used pytorch 1.0.1 in your project. Pytorch has added inplace detection since 1.5 which makes your code

def get_action(self, obs, deterministic=False):
        ''' sample action from the policy, conditioned on the task embedding '''
        z = self.z
        obs = ptu.from_numpy(obs[None])
        in_ = torch.cat([obs, z], dim=1)
        return self.policy.get_action(in_, deterministic=deterministic)

at https://github.com/katerakelly/oyster/blob/master/rlkit/torch/sac/agent.py throw an inplace operation error since the latent vector z is not detached from the action generation. It will cause policy loss function update z multiple times in one optimization which may cause some error s.

I think that might be a bug in your code and could affect the reliability on your final result.

Could you check on that?

About the input of context encoder

Pearl is an instructive paper.
However, in the comparison between the paper and the source code, I found one question, which is about the input of context encoder. In the paper, c={s, a, r, s'}, but in the source code, c={s, a, r}.
For such a design, how to analyze the impact of introducing s'?

Walker2d Rand Params Environment

Thank you for your inspirational great work and open sourcing the code. I have a question regarding the walker random environment. The walker can not walk when the reward is 800, this is the case for all the algorithms that use this environment. When doing a comparative analysis are we supposed to only compare the reward or is the reward scaled for this environment?

ResolvePackageNotFound building Docker image

Hi I'm trying to build a Docker image on an Ubuntu virtual machine. During the execution of RUN conda env update -f /tmp/environment.yml && conda clean --all -y I've encountered failure. The error message is pasted in below:

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed

ResolvePackageNotFound: 
  - readline==6.2=2
  - sqlite==3.13.0=0
  - icu==54.1=0
  - python==3.5.2=0
  - libiconv==1.14=0
  - libxml2==2.9.4=0
  - mako==1.0.6=py35_0
  - pyqt==5.6.0=py35_2
  - freetype==2.5.5=2
  - numba==0.35.0=np111py35_0
  - qt==5.6.2=5
  - matplotlib==2.0.2=np111py35_0
  - python-dateutil==2.6.1=py35_0
  - path.py==10.3.1=py35_0
  - fontconfig==2.12.1=3
  - tk==8.5.18=0
  - joblib==0.9.4=py35_0

My virtual machine is running Ubuntu 18.04 LTS.

Error when config recurrent encoder

I run the cheetah-vel experiment, only changing the recurrent encoder to True. I got the error:

collecting initial pool of data for train and eval
Traceback (most recent call last):
File "launch_experiment.py", line 144, in
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 105, in augment_usage_errors
yield
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "launch_experiment.py", line 141, in main
if name == "main":
File "launch_experiment.py", line 114, in experiment
def deep_update_dict(fr, to):
File "/home/thangdn3/investigate_PEARL/rlkit/core/rl_algorithm.py", line 192, in train
self.try_to_eval(it)
File "/home/thangdn3/investigate_PEARL/rlkit/core/rl_algorithm.py", line 236, in _try_to_eval
self.evaluate(epoch)
File "/home/thangdn3/investigate_PEARL/rlkit/core/rl_algorithm.py", line 423, in evaluate
self.agent.infer_posterior(context)
File "/home/thangdn3/investigate_PEARL/rlkit/torch/sac/agent.py", line 125, in infer_posterior
params = self.context_encoder(context)
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/thangdn3/investigate_PEARL/rlkit/torch/networks.py", line 168, in forward
out, (hn, cn) = self.lstm(out, (self.hidden, torch.zeros(self.hidden.size()).to(ptu.device)))
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/torch/nn/modules/rnn.py", line 556, in forward
self.check_forward_args(input, hx, batch_sizes)
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/torch/nn/modules/rnn.py", line 512, in check_forward_args
'Expected hidden[0] size {}, got {}')
File "/mnt/thangdn3/miniconda3/envs/pearl/lib/python3.5/site-packages/torch/nn/modules/rnn.py", line 176, in check_hidden_size
raise RuntimeError(msg.format(expected_hidden_size, tuple(hx.size())))
RuntimeError: Expected hidden[0] size (1, 1, 200), got (1, 16, 200)

The code seems not working well with mujoco-200

I've install mujoco-200 following the instruction, and I run "python launch_experiment.py ./configs/cheetah-dir.json". It should work as it is mentioned in Readme("For the task distributions in which the reward function varies (Cheetah, Ant, Humanoid), install MuJoCo200."), but still report lacking dependencies of Mujoco131. In fact, I think the code contains a mandatory check on the version of mujoco. Maybe the README is inconsistent with the actual situation.

Computer got stuck when running 'humanoid-dir'

When I ran the experiment'humanoid-dir', the computer got stuck in more than 200 iterations. This happened once before, I thought it was an accident , so I turned off the power and rebooted forcibly. Then I found that the graphics driver was broken. So I reinstalled the graphics driver and tried again. Then the computer got stuck again.
What may be the cause?
My computer has 16GB memory, CPU 8700K, GPU 2080.

BenchMark Results

I got exactly the same results of pearl just as shown in paper.
Could you provide benchmark results of Promp,MAML,RL2 ?

About algorithm of PEARL

Dear author,
In your paper you have mentioned that 'To achieve both meta-training efficiency and rapid adaptation, we propose an approach that integrates online inference of probabilistic context variables with existing off-policy RL algorithms'. So I wonder can I replace SAC with other algorithms such as DQN? If I can't , what is the reason.
Much thanks to you if you can reply my questions : )

Installing issue (conflict dependency and mujoco-py problem)

I tried to install this project on my computer: Ubuntu 16.04, mujoco 150 + 131 installed (with key) correctly (used in other projects normally). I follow the instruction to install dependent packages via conda env create -f environment.yml. I met 2 issues:

  1. jupyter-console==6.0.0 and ipython=6.5.0 have conflict requirements for prompt-toolkit==1.0.15 (so I commented out the jupyter-console requirement, which solved the problem).

  2. I struggled a bit with mujoco (first-time) with the very brief instruction for mujoco part. After I installed mujoco 150 and mujoco-py 1.50.1.68 => python launch_experiment required mujoco 131. So I installed mujoco 131, which I met all sorts of errors that can only installed correctly with mujoco-py 0.5.7, but then, your code would run incorrectly with all sorts of new errors.
    The solution is to install mujoco 150 normally and AFTER that, change MUJOCO_PY_MJPRO_PATH to the 131-directory.

Maybe you want to edit and include more instruction in the README for others.

Is mujoco 1.3.1 really necessary?

Hi there, I am not sure whether the repo rand_param_envs explicitly change anything about gym or mujoco. Now I can run the code, by simply changing the script from:

from rand_param_envs.gym import utils

to

from gym import utils (my gym is the latest one)

and so on, and the code is running smoothly.

So may I ask do you change anything about gym, mujoco_py, MujocoEnv from the original code? If not, then I think it's ok for me to use the official version of gym and mujoco (so that I don't need to install mujoco131..)

Thanks!

RL2 benchmark

Dear @katerakelly and @azhou42,

Thank you for providing your code for your exciting paper!

You are stating:

We ran our ProMP, MAML-TRPO, and RL2 baselines in the reference ProMP repo

But I cannot find the RL2 baselines.
Can you please give me a hint where to look?

I really appreciate any help you can provide.

ENV[ ] registeration

Thank you very much for releasing the code!
When I try to run experiment 'ant-dir' , problem occured as shown below:
It seems that 'ant-dir' is not registered in ENV[ ] list , only 'cheetah-vel' ,'sparse-point-robot' and 'point-robot'.
So , how should I register all of the envs in oyster/rlkit/envs into ENVS[ ] list?

### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###

python launch_experiment.py ./configs/ant-dir.json

print ENVS[ ]: {'cheetah-vel': <class 'rlkit.envs.half_cheetah_vel.HalfCheetahVelEnv'>, 'sparse-point-robot': <class 'rlkit.envs.point_robot.SparsePointEnv'>, 'point-robot': <class 'rlkit.envs.point_robot.PointEnv'>}

Traceback (most recent call last):
File "launch_experiment.py", line 144, in
main()
File "/home/ljy/anaconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/ljy/anaconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/ljy/anaconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ljy/anaconda3/envs/pearl/lib/python3.5/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "launch_experiment.py", line 141, in main
experiment(variant)
File "launch_experiment.py", line 29, in experiment
env = NormalizedBoxEnv(ENVS[variant['env_name']] (**variant['env_params']))
KeyError: 'ant-dir'

Where to change seed

Thanks for sharing the repo!

I was wondering where can I change the experiment seed?
I can't seem to find seed in launch_experiment.py or default_config or env specific config file.

Issue with Reproducing Results

Hello, great work on your paper! I am trying to reproduce the results but am having some issues for some environments, most notably for humanoid-dir and walker-rand-params.

I'm running experiments on a machine with Ubuntu 18.04.4 LTS and TITAN RTX. I am trying to run on Ubuntu 16.04 (but not GPU) and am getting similar results.

I have plots humanoid-dir and walker-rand-params. The blue curves are the results you uploaded to Dropbox; the orange is what I produced (I haven't ran them for that long but they look very different from what you had. I've tested from a clean clone of this repo and am using the same configs.

Do you have any idea why this is happening? Thanks in advance!

Screen Shot 2020-05-13 at 12 49 25 PM

Screen Shot 2020-05-13 at 1 03 36 PM

Default Parameters Paper <-> Repository

Hello,
First, thank you for your research! I am currently trying (and struggling a little) to reproduce your results on the HalfCheetahVel environment. I noticed some differences between the algorithm in your paper (https://arxiv.org/pdf/1903.08254.pdf, page 5) and the default settings in this repository, i.e.

  • the KL loss is scaled by .1 in the default settings
  • the next observation is not included in the sampled context by default, whereas in the paper I got the impression the next observation would always be included in the context.

I would highly appreciate if you can make some statement about the effect of those differences. Is my assumption correct, that your results were produced with the default parameters from the repo?

Thank you in advance!

Questions about result figures

Hi, I have a question of result figures in section 6 comparing six different benchmarks. The caption says that "Test-task performance vs. samples collected during meta-training". Is it AverageReturn_all_test_tasks vs. Number of env steps total in the progress.csv? I drew all of them but they can not always match.

Also, how do you draw the figure with some light-colored shade? Does the shade mean the variance? If so, does it refer to the variance of 3 different random seeds' average return?

Also, I trained the ant-goal. The best return is -440, also have a big gap with your result (-200). But the rest of five is comparable. What it might be the reason?

Thank you in advance!

Figure_2

ori

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.