jannerm / mbpo Goto Github PK
View Code? Open in Web Editor NEWCode for the paper "When to Trust Your Model: Model-Based Policy Optimization"
Home Page: https://jannerm.github.io/mbpo-www/
License: MIT License
Code for the paper "When to Trust Your Model: Model-Based Policy Optimization"
Home Page: https://jannerm.github.io/mbpo-www/
License: MIT License
I'd like to specify my own log directory to dump the ray logs into, but mbpo run_local
doesn't seem to have a --log-dir
flag enabled. It looks like this is set manually in the config
files right now, but I'd like to be able to easily set the directory in the launch command. Any suggestions?
Dear author,
Could you explain the reason why some values in observation are ignored in Ant and Humanoid environments in comparison to original environment from openAI Gym.
https://github.com/JannerM/mbpo/blob/master/mbpo/env/ant.py
https://github.com/JannerM/mbpo/blob/master/mbpo/env/humanoid.py
Thank you so much!
Thank you very much for making this code available publicly.
I have added the continuous mountain car environment by following the steps as mentioned.
MBPO solves mountain car if I keep real ratio 1 that is optimizing policy on Denv (Environment data set).
When real ratio is 0.05 then it fails to solve even after tuning all the hyper parameters.
Could you please add solution to continuous mountain car? or could you please tell me why it is failing?
How do you run mbpo/examples/development/main.py on command line using 'python main.py', rather than doing:
mbpo run_local examples.development --config=examples.config.halfcheetah.0 --gpus=1 --trial-gpus=1
Here's what I've changed in the main.py file but it doesn't work:
def main(argv=None):
"""Run ExperimentRunner locally on ray.
To run this example on cloud (e.g. gce/ec2), use the setup scripts:
'softlearning launch_example_{gce,ec2} examples.development <options>'.
Run 'softlearning launch_example_{gce,ec2} --help' for further
instructions.
"""
# __package__ should be `development.main`
run_example_local('examples.development.main', argv, local_mode=True)
if __name__ == '__main__':
main(argv = ['--config=examples.config.halfcheetah.0', '--gpus=1', '--trial-gpus=1'])
Gets the error:
Traceback (most recent call last):
File "/home/jack/repos/mbpo/examples/development/main.py", line 255, in <module>
main(argv = ['--config=examples.config.halfcheetah.0', '--gpus=1', '--trial-gpus=1'])
File "/home/jack/repos/mbpo/examples/development/main.py", line 248, in main
run_example_local('examples.development.main', argv, local_mode=True)
File "/home/jack/repos/mbpo/examples/instrument.py", line 205, in run_example_local
example_args = example_module.get_parser().parse_args(example_argv)
AttributeError: module 'examples.development.main' has no attribute 'get_parser'
I'd like to run it this way for debugging purposes
First off, great work and thank you for making this publicly available.
I have been experimenting with your code, as I want to adapt it on my own custom environment. For this reason, I adopted a slightly larger NN structure for world modelling and noticed that part of the code seems to be memory leaking. For instance, the save_state method constantly adds nodes in the tensorflow graph which never get removed by the garbage collector. The same thing (to a lesser degree) occurs here. To test this claim, I simply counted the number of nodes and the memory usage before and after calling self._set_state().
I used len(self.sess.graph._nodes_by_name.keys())
to count the number of nodes within the tf graph and resource.getrusage(resource.RSUAGE_SELF).ru_maxrss
to compute the RAM usage (in kb). A typical instance of what I obtain is seen below, where each call to this method, increases RAM consumption by around 100 MBs. When training for many epochs, this leads to OOM errors, as one would expect.
In the end, my question is twofold:
i want to add some continual action experiment, and i how to come true it?
Was trying to call the python file. Found it in softlearning/scripts/console-scripts.py
I run halfcheetach env using MBPO(SAC+3NNs(dynamics), and my training loss increases with this the reward.
I don't have intuition to interpret this
why training loss of model based policy optimization increases?
I can share wandb
Thanks
By following the instructions, and creating the environment via the 'gpu-env.yml' declaration,
the command fails during the pip installation phase.
First, the following step seems to be omitted, but I think is necessary:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home//.mujoco/mjpro150/bin
(where is your username).
But even after that, I get a "Pip failed" error, and it seems to stem from the following error:
"fatal error: mjmodel.h: No such file or directory".
can you relate to this problem? How do I solve it?
Alex
Failed to build mujoco-py
ERROR: flask 1.1.1 has requirement Werkzeug>=0.15, but you'll have werkzeug 0.14.1 which is incompatible.
ERROR: awscli 1.16.67 has requirement PyYAML<=3.13,>=3.10, but you'll have pyyaml 4.2b4 which is incompatible.
Hi,
This is really a nice work,
I've faced some issues related to TensorFlow and CUDA, and I'm not that good with TensorFlow, I'm a Pytorch guy.
So I've decided to make a Pytorch implementation for MBPO, and I'm trying to understand your code..
From my understanding:
Taking AntTruncatedObs-v2 as a working example,
Pytorch Pceucode:
Total epochs = 1000
Epoch steps = 1000
Exploration epochs = 10
01. Initialize networks [Model, SAC]
02. Initialize training w/ [10 Exploration epochs (random) = 10 x 1000 environmnet steps]
03. For n in [Total epochs - Exploration epochs = 990 Epochs]:
04. For i in [ 1000 Epoch Steps]:
05. If i % [250 Model training freq] == 0:
06. For g in [How many Model Gradient Steps???]:
07. Sample a [256 size batch] from Env_pool
08. Train the Model network
09. Sample a [100k size batch] from Env_pool
10. Set rollout_length
11. Reallocate Model_pool [???]
12. Rollout Model for rollout_length, and Add rollouts to Model_pool
13. Sample an [action a] from the policy, Take Env step, and Add to Env_pool
14. For g in [20 SAC Gradient Steps]:
15. Sample a [256 size batch] from [05% Env_pool, 95% Model_pool]
16. Train the Actor-Critic networks
17. Evaluate the policy
Is that right?
My questions are about lines 06 & 11:
06: You're using some real time period to train the model.. in terms of gradients steps, How many steps they're?
11: When you reallocate the Model_pool, you set the [Model_pool size] to the number of [model steps per epoch],
But.. Isn't that a really huge training set for SAC updates? Are you disgarding all Model steps from previous epochs?
Sorry for this very big issue..
Best wishes and kind regards.
Rami Ahmed
It seems mujoco-py
does not have checkpoint 435d2143abe04fe4648c6c0c1b848bf1fc06b73b anymore. Change this line to mujoco-py==1.50.1.68
can solve the problem.
mbpo run_local examples.development --config=examples.config.halfcheetah.0 \
--checkpoint-frequency=1000 --gpus=1 --trial-gpus=1
How to run the code? The command seems not working
Thanks!
Hi there, I'm trying to use Pusher2d-ImageReach-v0 env by creating a pusher2d/0.py
config.
params = {
'type': 'MBPO',
'universe': 'gym',
'domain': 'Pusher2d',
'task': 'ImageReach-v0',
'log_dir': '~/ray_mbpo/',
'exp_name': 'defaults',
'kwargs': {
'epoch_length': 1000,
'train_every_n_steps': 1,
'n_train_repeat': 20,
'eval_render_mode': None,
'eval_n_episodes': 1,
'eval_deterministic': True,
'discount': 0.99,
'tau': 5e-3,
'reward_scale': 1.0,
'model_train_freq': 1000,
'model_retain_epochs': 5,
'rollout_batch_size': 100e3,
'deterministic': False,
'num_networks': 7,
'num_elites': 5,
'real_ratio': 0.05,
'target_entropy': -2,
'max_model_t': None,
'rollout_schedule': [20, 300, 1, 20],
'hidden_dim': 400,
}
}
However, I got this error:
Traceback (most recent call last):
File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 389, in _process_events
result = self.trial_executor.fetch_result(trial)
File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 252, in fetch_result
result = ray.get(trial_future[0])
File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/ray/worker.py", line 2288, in get
raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=8159, host=lab-server)
File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/ray/tune/trainable.py", line 150, in train
result = self._train()
File "/home/lab/Github/TendonTrack/ipk_mbpo/examples/development/main.py", line 98, in _train
self._build()
File "/home/lab/Github/TendonTrack/ipk_mbpo/examples/development/main.py", line 48, in _build
get_environment_from_params(environment_params['training']))
File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/utils.py", line 34, in get_environment_from_params
return get_environment(universe, domain, task, environment_kwargs)
File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/utils.py", line 24, in get_environment
env = ADAPTERS[universe](domain, task, **environment_params)
File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/adapters/gym_adapter.py", line 68, in __init__
env = gym.envs.make(env_id, **kwargs)
File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/gym/envs/registration.py", line 156, in make
return registry.make(id, **kwargs)
File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/gym/envs/registration.py", line 101, in make
env = spec.make(**kwargs)
File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/gym/envs/registration.py", line 73, in make
env = cls(**_kwargs)
File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/gym/mujoco/image_pusher_2d.py", line 57, in __init__
super(ImageForkReacher2dEnv, self).__init__(*args, **kwargs)
File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/gym/mujoco/image_pusher_2d.py", line 11, in __init__
Pusher2dEnv.__init__(self, *args, **kwargs)
File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/gym/mujoco/pusher_2d.py", line 58, in __init__
MujocoEnv.__init__(self, model_path=self.MODEL_PATH, frame_skip=5)
File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/gym/envs/mujoco/mujoco_env.py", line 45, in __init__
raise IOError("File %s does not exist" % fullpath)
OSError: File /home/lab/Github/TendonTrack/ipk_mbpo/models/pusher_2d.xml does not exist
Is there actually no model about it in your repository?
Furthermore, the reason why I want to test this environment is trying to figure out the preprocessing method of image observation. Do you have any suggestions about that?
I am trying to install your package for a student and I am running into the following problem:
@(mbpo) Singularity> mbpo run_local examples.development --config=examples.config.halfcheetah.0
Traceback (most recent call last):
File "/usr/local/bin/miniconda/envs/mbpo/bin/mbpo", line 11, in <module>
load_entry_point('mbpo==0.0.1', 'console_scripts', 'mbpo')()
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/scripts/console_scripts.py", line 202, in main
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/scripts/console_scripts.py", line 71, in run_example_local_cmd
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/examples/instrument.py", line 205, in run_example_local
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/examples/development/__init__.py", line 35, in get_parser
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/examples/utils.py", line 8, in <module>
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/algorithms/__init__.py", line 1, in <module>
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/algorithms/sql.py", line 8, in <module>
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/algorithms/rl_algorithm.py", line 12, in <module>
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/samplers/__init__.py", line 4, in <module>
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/samplers/remote_sampler.py", line 10, in <module>
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/samplers/utils.py", line 5, in <module>
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/replay_pools/__init__.py", line 4, in <module>
File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/replay_pools/trajectory_replay_pool.py", line 8, in <module>
ModuleNotFoundError: No module named 'softlearning.utils'
I have mujoco and mujoco_py installed within a Singularity container. I installed your package by modifying tensorflow-gpu
to tensorflow
in the environment/requirements.txt
file. Any suggestions?
I met this problem while running pip install -e viskit
(mbpo) dingcheng@dingcheng-GP63-Leopard-8RE:~/Documents/mbpo$ pip install -e viskit
ERROR: File "setup.py" not found. Directory cannot be installed in editable mode: /home/dingcheng/Documents/mbpo/viskit
Any idea why that would happen? Thanks
Hi, Thanks for sharing the novel idea and great work!
I'm wondering if the code for Model Generalization in Practice section is available? That section helps me understand the model learning part of the algorithm. :-)
Hi,
Thanks for releasing the code!
I'd like to run MBPO on Walker2D. It seems that it threw an error about the argument model_pool_size
(https://github.com/JannerM/mbpo/blob/master/examples/config/walker2d/0.py#L25), which seems to be deprecated (https://github.com/JannerM/mbpo/blob/master/mbpo/algorithms/mbpo.py#L107).
Hi, may I ask a question on the original paper here?
I am confused by the following statement of Lemma B.3
From Lemma B.1 it is straightforward to have the TV distance of joint distributions smaller than something, but for marginal distributions as
I have trouble figuring this out, could you please shed some light on this?
Thank you very much!
Hi -- this dir has modified gym ~v2 environments for the ant/humanoid that have a modified observation space and early termination and this file calls into the parameterized gym v3 environments with no early termination or time-based rewards, and if I understand correctly, also inherits the original observation space of the v3 environments. These two files seem to be in conflict with each other as the mbpo/env
environments don't have the same parameters as examples/development/base.py
. Can you clarify what envs/observation spaces/rewards you use for training and report in the paper?
My current assumption is that the base.py
code is the latest and that the mbpo/env
code is out-dated and that you are using the v3
envs with the default observation space there, no early termination/alive bonus, and are training on and directly reporting the reward from these environments (rather than re-running the evaluation in the default v3 environments with time bonus and ET). Is this correct?
Hi !!
Thank you for sharing your great research and codes.
I have a question about the mujoco environments about your experiments.
In the paper, your method were compared with the PETS, which needs the states to calculate the reward in off-line, although your method does not need that states.
I checked your codes and your environments, but you did not use the states which are needed to calculate the rewards in off-line. How did you run the PETS which assume that we can observe the states to calculate the rewards in off-line?? (e.g. we will use get_body_com("torso") to calculate the reward in the Antenv. )
Thank you for your help.
Hi,
When I try to run the main example using mbpo run_local examples.development --config=examples.config.halfcheetah.0 --gpus=1 --trial-gpus=1
, I have an issue with the classes that inherit from Serializable (GymAdapter, SoftlearningEnv, ImagePusher2dEnv...). The problem is that the _Serializable__initialize
method is not defined:
2021-05-03 18:03:36,442 ERROR trial_runner.py:426 -- Error processing event.
Traceback (most recent call last):
File "/home/ricardo/Documentos/venv_mbpo/env-py3.6/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 389, in _process_events
result = self.trial_executor.fetch_result(trial)
File "/home/ricardo/Documentos/venv_mbpo/env-py3.6/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 252, in fetch_result
result = ray.get(trial_future[0])
File "/home/ricardo/Documentos/venv_mbpo/env-py3.6/lib/python3.6/site-packages/ray/worker.py", line 2288, in get
raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=4162, host=ricardo-VivoBook-ASUS-Laptop-X505ZA-X505ZA)
File "/home/ricardo/Documentos/venv_mbpo/env-py3.6/lib/python3.6/site-packages/ray/tune/trainable.py", line 150, in train
result = self._train()
File "/home/ricardo/Documentos/mbpo/examples/development/main.py", line 85, in _train
self._build()
File "/home/ricardo/Documentos/mbpo/examples/development/main.py", line 46, in _build
get_environment_from_params(environment_params['training']))
File "/home/ricardo/Documentos/mbpo/softlearning/environments/utils.py", line 28, in get_environment_from_params
return get_environment(universe, domain, task, environment_kwargs)
File "/home/ricardo/Documentos/mbpo/softlearning/environments/utils.py", line 18, in get_environment
env = ADAPTERS[universe](domain, task, **environment_params)
File "/home/ricardo/Documentos/mbpo/softlearning/environments/adapters/gym_adapter.py", line 60, in __init__
self._Serializable__initialize(locals())
AttributeError: 'GymAdapter' object has no attribute '_Serializable__initialize'
Is it possible that I am using the wrong version of Serializable?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.