jannerm / mbpo Goto Github PK

View Code? Open in Web Editor NEW

467.0 467.0 83.0 109 KB

Code for the paper "When to Trust Your Model: Model-Based Policy Optimization"

Home Page: https://jannerm.github.io/mbpo-www/

License: MIT License

Python 100.00%

mbpo's People

Contributors

Stargazers

Watchers

Forkers

kapitsa2811 wilsonwangthu williamd4112 nikitadhawan takuyahiraoka kunbb vvanirudh samuelstanton gingkg taylor-liu numahha anyboby mr-xia-6 mayankm96 melfm yufeiwang63 ramiribat godka johnny-zhang92 cost-97 hintonthu xfdywy mcarobene yaoyao1995 tansenl saberguo mihirp98 bhyang alexanderkhazatsky jiangsy hebowei2000 lamperougeyxy yanshuolee maxsobolmark xionghuichen staminatang tim-ts-chu zmandyhe rudrasohan colinqiyangli franktiantt idthanm radum2275 nagisazj gsamp timschneider42 eleazhong chewwt hzm2016 sltju pedrodbs rafeemusabbir qzj-debug jimenius zhihanlee szahlner sonsang mehranagh20 junming-yang zhanghc12 watood luciferdarkstar zivzone eshemomoh trevormcinroe sjtuguofei my-nonlinear-valentine michelllepan jieli18 santowijaya1 mcpfirefly leeyoungleon ramonpereira benwyb hansungy rosunaor greenantoflw oneenooo shenjiede dtbinh guosy0506

mbpo's Issues

Different constant in the reward function of Humanoid w.r.t the original Humanoid

Compare two lines:
https://github.com/JannerM/mbpo/blob/49587a1b0b4e91f8068009e0727c71bc68d904f2/mbpo/env/humanoid.py#L37 vs. https://github.com/openai/gym/blob/8cf2685db25572e4fd6716565694d05f83000d60/gym/envs/mujoco/humanoid.py#L30

[question] specifying a custom log directory

I'd like to specify my own log directory to dump the ray logs into, but mbpo run_local doesn't seem to have a --log-dir flag enabled. It looks like this is set manually in the config files right now, but I'd like to be able to easily set the directory in the launch command. Any suggestions?

Observations in Ant and Humanoid env

Dear author,

Could you explain the reason why some values in observation are ignored in Ant and Humanoid environments in comparison to original environment from openAI Gym.
https://github.com/JannerM/mbpo/blob/master/mbpo/env/ant.py
https://github.com/JannerM/mbpo/blob/master/mbpo/env/humanoid.py

Thank you so much!

MBPO not solving Continuous mountain car problem.

Thank you very much for making this code available publicly.
I have added the continuous mountain car environment by following the steps as mentioned.
MBPO solves mountain car if I keep real ratio 1 that is optimizing policy on Denv (Environment data set).
When real ratio is 0.05 then it fails to solve even after tuning all the hyper parameters.

Could you please add solution to continuous mountain car? or could you please tell me why it is failing?

How do you run examples using python main.py?

How do you run mbpo/examples/development/main.py on command line using 'python main.py', rather than doing:

mbpo run_local examples.development --config=examples.config.halfcheetah.0 --gpus=1 --trial-gpus=1

Here's what I've changed in the main.py file but it doesn't work:

def main(argv=None):
    """Run ExperimentRunner locally on ray.

    To run this example on cloud (e.g. gce/ec2), use the setup scripts:
    'softlearning launch_example_{gce,ec2} examples.development <options>'.

    Run 'softlearning launch_example_{gce,ec2} --help' for further
    instructions.
    """
    # __package__ should be `development.main`

    run_example_local('examples.development.main', argv, local_mode=True)

if __name__ == '__main__':

    main(argv = ['--config=examples.config.halfcheetah.0', '--gpus=1', '--trial-gpus=1'])

Gets the error:

Traceback (most recent call last):
  File "/home/jack/repos/mbpo/examples/development/main.py", line 255, in <module>
    main(argv = ['--config=examples.config.halfcheetah.0', '--gpus=1', '--trial-gpus=1'])
  File "/home/jack/repos/mbpo/examples/development/main.py", line 248, in main
    run_example_local('examples.development.main', argv, local_mode=True)
  File "/home/jack/repos/mbpo/examples/instrument.py", line 205, in run_example_local
    example_args = example_module.get_parser().parse_args(example_argv)
AttributeError: module 'examples.development.main' has no attribute 'get_parser'

I'd like to run it this way for debugging purposes

Potential Memory Leak

First off, great work and thank you for making this publicly available.

I have been experimenting with your code, as I want to adapt it on my own custom environment. For this reason, I adopted a slightly larger NN structure for world modelling and noticed that part of the code seems to be memory leaking. For instance, the save_state method constantly adds nodes in the tensorflow graph which never get removed by the garbage collector. The same thing (to a lesser degree) occurs here. To test this claim, I simply counted the number of nodes and the memory usage before and after calling self._set_state().

I used len(self.sess.graph._nodes_by_name.keys()) to count the number of nodes within the tf graph and resource.getrusage(resource.RSUAGE_SELF).ru_maxrss to compute the RAM usage (in kb). A typical instance of what I obtain is seen below, where each call to this method, increases RAM consumption by around 100 MBs. When training for many epochs, this leads to OOM errors, as one would expect.

In the end, my question is twofold:

Did you experience this sort of memory leak during your experiments? Note that, even though I am testing in a custom environment, this should not have anything to do with this part of the code which I have left untouched.
What exactly is the purpose of the self.state variable? Inspecting the code, other than keeping track of model states, it does not seem to interact with anything else. To me, it just seems like copies of previous models, which unfortunately never get removed from RAM. Am I missing something?

this code is good,and i want to extend experiment not limited to mujoco

i want to add some continual action experiment, and i how to come true it?

Is there a way to directly run python file

Was trying to call the python file. Found it in softlearning/scripts/console-scripts.py

intuition on the loss function

I run halfcheetach env using MBPO(SAC+3NNs(dynamics), and my training loss increases with this the reward.
I don't have intuition to interpret this
why training loss of model based policy optimization increases?
I can share wandb

Thanks

Environment creation via gpu-env.yml fails

By following the instructions, and creating the environment via the 'gpu-env.yml' declaration,
the command fails during the pip installation phase.

First, the following step seems to be omitted, but I think is necessary:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home//.mujoco/mjpro150/bin

(where is your username).

But even after that, I get a "Pip failed" error, and it seems to stem from the following error:
"fatal error: mjmodel.h: No such file or directory".

can you relate to this problem? How do I solve it?
Alex

When I tried to reproduce the project environment with $ conda env create -f environment/gpu-env.yml encountered some problems

Failed to build mujoco-py
ERROR: flask 1.1.1 has requirement Werkzeug>=0.15, but you'll have werkzeug 0.14.1 which is incompatible.
ERROR: awscli 1.16.67 has requirement PyYAML<=3.13,>=3.10, but you'll have pyyaml 4.2b4 which is incompatible.

Algorithm Architecture and Pytorch Implementation

Hi,
This is really a nice work,

I've faced some issues related to TensorFlow and CUDA, and I'm not that good with TensorFlow, I'm a Pytorch guy.

So I've decided to make a Pytorch implementation for MBPO, and I'm trying to understand your code..

From my understanding:
Taking AntTruncatedObs-v2 as a working example,

Pytorch Pceucode:

Total epochs = 1000
Epoch steps = 1000
Exploration epochs = 10

01. Initialize networks [Model, SAC]
02. Initialize training w/ [10 Exploration epochs (random) = 10 x 1000 environmnet steps]
03. For n in [Total epochs - Exploration epochs = 990 Epochs]:
04.    For i in [ 1000 Epoch Steps]:
05.        If i % [250 Model training freq] == 0:
06.            For g in [How many Model Gradient Steps???]:
07.                Sample a [256 size batch] from Env_pool
08.                Train the Model network
09.            Sample a [100k size batch] from Env_pool
10.            Set rollout_length
11.            Reallocate Model_pool [???]
12.            Rollout Model for rollout_length, and Add rollouts to Model_pool
13.        Sample an [action a] from the policy, Take Env step, and Add to Env_pool
14.        For g in [20 SAC Gradient Steps]:
15.            Sample a [256 size batch] from [05% Env_pool, 95% Model_pool]
16.            Train the Actor-Critic networks
17.    Evaluate the policy

Is that right?

My questions are about lines 06 & 11:

06: You're using some real time period to train the model.. in terms of gradients steps, How many steps they're?
11: When you reallocate the Model_pool, you set the [Model_pool size] to the number of [model steps per epoch],
But.. Isn't that a really huge training set for SAC updates? Are you disgarding all Model steps from previous epochs?

Sorry for this very big issue..

Best wishes and kind regards.

Rami Ahmed

mujoco-py checkpoint does not exist

https://github.com/JannerM/mbpo/blob/39365f230292a452f0d77d1ad4f2a1795d311052/environment/requirements.txt#L51

It seems mujoco-py does not have checkpoint 435d2143abe04fe4648c6c0c1b848bf1fc06b73b anymore. Change this line to mujoco-py==1.50.1.68 can solve the problem.

Cannot run

mbpo run_local examples.development --config=examples.config.halfcheetah.0 \
	--checkpoint-frequency=1000 --gpus=1 --trial-gpus=1

How to run the code? The command seems not working

Thanks!

Pusher_2d.xml does not exist

Hi there, I'm trying to use Pusher2d-ImageReach-v0 env by creating a pusher2d/0.py config.

params = {
    'type': 'MBPO',
    'universe': 'gym',
    'domain': 'Pusher2d',
    'task': 'ImageReach-v0',

    'log_dir': '~/ray_mbpo/',
    'exp_name': 'defaults',

    'kwargs': {
        'epoch_length': 1000,
        'train_every_n_steps': 1,
        'n_train_repeat': 20,
        'eval_render_mode': None,
        'eval_n_episodes': 1,
        'eval_deterministic': True,

        'discount': 0.99,
        'tau': 5e-3,
        'reward_scale': 1.0,

        'model_train_freq': 1000,
        'model_retain_epochs': 5,
        'rollout_batch_size': 100e3,
        'deterministic': False,
        'num_networks': 7,
        'num_elites': 5,
        'real_ratio': 0.05,
        'target_entropy': -2,
        'max_model_t': None,
        'rollout_schedule': [20, 300, 1, 20],
        'hidden_dim': 400,
    }
}

However, I got this error:

Traceback (most recent call last):
  File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 389, in _process_events
    result = self.trial_executor.fetch_result(trial)
  File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 252, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/ray/worker.py", line 2288, in get
    raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=8159, host=lab-server)
  File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/ray/tune/trainable.py", line 150, in train
    result = self._train()
  File "/home/lab/Github/TendonTrack/ipk_mbpo/examples/development/main.py", line 98, in _train
    self._build()
  File "/home/lab/Github/TendonTrack/ipk_mbpo/examples/development/main.py", line 48, in _build
    get_environment_from_params(environment_params['training']))
  File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/utils.py", line 34, in get_environment_from_params
    return get_environment(universe, domain, task, environment_kwargs)
  File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/utils.py", line 24, in get_environment
    env = ADAPTERS[universe](domain, task, **environment_params)
  File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/adapters/gym_adapter.py", line 68, in __init__
    env = gym.envs.make(env_id, **kwargs)
  File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/gym/envs/registration.py", line 156, in make
    return registry.make(id, **kwargs)
  File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/gym/envs/registration.py", line 101, in make
    env = spec.make(**kwargs)
  File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/gym/envs/registration.py", line 73, in make
    env = cls(**_kwargs)
  File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/gym/mujoco/image_pusher_2d.py", line 57, in __init__
    super(ImageForkReacher2dEnv, self).__init__(*args, **kwargs)
  File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/gym/mujoco/image_pusher_2d.py", line 11, in __init__
    Pusher2dEnv.__init__(self, *args, **kwargs)
  File "/home/lab/Github/TendonTrack/ipk_mbpo/softlearning/environments/gym/mujoco/pusher_2d.py", line 58, in __init__
    MujocoEnv.__init__(self, model_path=self.MODEL_PATH, frame_skip=5)
  File "/home/lab/anaconda3/envs/mbpo/lib/python3.6/site-packages/gym/envs/mujoco/mujoco_env.py", line 45, in __init__
    raise IOError("File %s does not exist" % fullpath)
OSError: File /home/lab/Github/TendonTrack/ipk_mbpo/models/pusher_2d.xml does not exist

Is there actually no model about it in your repository?

Furthermore, the reason why I want to test this environment is trying to figure out the preprocessing method of image observation. Do you have any suggestions about that?

ModuleNotFoundError: No module named 'softlearning.utils'

I am trying to install your package for a student and I am running into the following problem:

@(mbpo) Singularity> mbpo run_local examples.development --config=examples.config.halfcheetah.0
Traceback (most recent call last):
  File "/usr/local/bin/miniconda/envs/mbpo/bin/mbpo", line 11, in <module>
    load_entry_point('mbpo==0.0.1', 'console_scripts', 'mbpo')()
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/scripts/console_scripts.py", line 202, in main
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/scripts/console_scripts.py", line 71, in run_example_local_cmd
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/examples/instrument.py", line 205, in run_example_local
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/examples/development/__init__.py", line 35, in get_parser
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/examples/utils.py", line 8, in <module>
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/algorithms/__init__.py", line 1, in <module>
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/algorithms/sql.py", line 8, in <module>
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/algorithms/rl_algorithm.py", line 12, in <module>
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/samplers/__init__.py", line 4, in <module>
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/samplers/remote_sampler.py", line 10, in <module>
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/samplers/utils.py", line 5, in <module>
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/replay_pools/__init__.py", line 4, in <module>
  File "/usr/local/bin/miniconda/envs/mbpo/lib/python3.6/site-packages/mbpo-0.0.1-py3.6.egg/softlearning/replay_pools/trajectory_replay_pool.py", line 8, in <module>
ModuleNotFoundError: No module named 'softlearning.utils'

I have mujoco and mujoco_py installed within a Singularity container. I installed your package by modifying tensorflow-gpu to tensorflow in the environment/requirements.txt file. Any suggestions?

Viskit Installation Problem

I met this problem while running pip install -e viskit

(mbpo) dingcheng@dingcheng-GP63-Leopard-8RE:~/Documents/mbpo$ pip install -e viskit
ERROR: File "setup.py" not found. Directory cannot be installed in editable mode: /home/dingcheng/Documents/mbpo/viskit

Any idea why that would happen? Thanks

Model Generalization in Practice

Hi, Thanks for sharing the novel idea and great work!

I'm wondering if the code for Model Generalization in Practice section is available? That section helps me understand the model learning part of the algorithm. :-)

Runtime error on Walker2D

Hi,

Thanks for releasing the code!

I'd like to run MBPO on Walker2D. It seems that it threw an error about the argument model_pool_size (https://github.com/JannerM/mbpo/blob/master/examples/config/walker2d/0.py#L25), which seems to be deprecated (https://github.com/JannerM/mbpo/blob/master/mbpo/algorithms/mbpo.py#L107).

Question on Lemma B.3

Hi, may I ask a question on the original paper here?
I am confused by the following statement of Lemma B.3

From Lemma B.1 it is straightforward to have the TV distance of joint distributions smaller than something, but for marginal distributions as

I have trouble figuring this out, could you please shed some light on this?
Thank you very much!

Clarification on Ant/Humanoid environments

Hi -- this dir has modified gym ~v2 environments for the ant/humanoid that have a modified observation space and early termination and this file calls into the parameterized gym v3 environments with no early termination or time-based rewards, and if I understand correctly, also inherits the original observation space of the v3 environments. These two files seem to be in conflict with each other as the mbpo/env environments don't have the same parameters as examples/development/base.py. Can you clarify what envs/observation spaces/rewards you use for training and report in the paper?

My current assumption is that the base.py code is the latest and that the mbpo/env code is out-dated and that you are using the v3 envs with the default observation space there, no early termination/alive bonus, and are training on and directly reporting the reward from these environments (rather than re-running the evaluation in the default v3 environments with time bonus and ET). Is this correct?

Observation of the mujoco env

Hi !!
Thank you for sharing your great research and codes.

I have a question about the mujoco environments about your experiments.
In the paper, your method were compared with the PETS, which needs the states to calculate the reward in off-line, although your method does not need that states.
I checked your codes and your environments, but you did not use the states which are needed to calculate the rewards in off-line. How did you run the PETS which assume that we can observe the states to calculate the rewards in off-line?? (e.g. we will use get_body_com("torso") to calculate the reward in the Antenv. )

Thank you for your help.

https://github.com/JannerM/mbpo/blob/39365f230292a452f0d77d1ad4f2a1795d311052/mbpo/env/ant.py#L31-L36

'GymAdapter' object has no attribute '_Serializable__initialize'

Hi,

When I try to run the main example using mbpo run_local examples.development --config=examples.config.halfcheetah.0 --gpus=1 --trial-gpus=1, I have an issue with the classes that inherit from Serializable (GymAdapter, SoftlearningEnv, ImagePusher2dEnv...). The problem is that the _Serializable__initialize method is not defined:

2021-05-03 18:03:36,442	ERROR trial_runner.py:426 -- Error processing event.
Traceback (most recent call last):
  File "/home/ricardo/Documentos/venv_mbpo/env-py3.6/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 389, in _process_events
    result = self.trial_executor.fetch_result(trial)
  File "/home/ricardo/Documentos/venv_mbpo/env-py3.6/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 252, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/ricardo/Documentos/venv_mbpo/env-py3.6/lib/python3.6/site-packages/ray/worker.py", line 2288, in get
    raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=4162, host=ricardo-VivoBook-ASUS-Laptop-X505ZA-X505ZA)
  File "/home/ricardo/Documentos/venv_mbpo/env-py3.6/lib/python3.6/site-packages/ray/tune/trainable.py", line 150, in train
    result = self._train()
  File "/home/ricardo/Documentos/mbpo/examples/development/main.py", line 85, in _train
    self._build()
  File "/home/ricardo/Documentos/mbpo/examples/development/main.py", line 46, in _build
    get_environment_from_params(environment_params['training']))
  File "/home/ricardo/Documentos/mbpo/softlearning/environments/utils.py", line 28, in get_environment_from_params
    return get_environment(universe, domain, task, environment_kwargs)
  File "/home/ricardo/Documentos/mbpo/softlearning/environments/utils.py", line 18, in get_environment
    env = ADAPTERS[universe](domain, task, **environment_params)
  File "/home/ricardo/Documentos/mbpo/softlearning/environments/adapters/gym_adapter.py", line 60, in __init__
    self._Serializable__initialize(locals())
AttributeError: 'GymAdapter' object has no attribute '_Serializable__initialize'

Is it possible that I am using the wrong version of Serializable?

Thanks.