Giter Site home page Giter Site logo

openai / train-procgen Goto Github PK

View Code? Open in Web Editor NEW
166.0 128.0 54.0 71.36 MB

Code for the paper "Leveraging Procedural Generation to Benchmark Reinforcement Learning"

Home Page: https://openai.com/blog/procgen-benchmark/

License: MIT License

Python 100.00%
paper

train-procgen's Introduction

Status: Archive (code is provided as-is, no updates expected)

Leveraging Procedural Generation to Benchmark Reinforcement Learning

This is code for training agents for some of the experiments in Leveraging Procedural Generation to Benchmark Reinforcement Learning (citation). The code for the environments is in the Procgen Benchmark repo.

We're currently running a competition which uses these environments to measure sample efficiency and generalization in RL. You can learn more and register here.

Supported platforms:

  • macOS 10.14 (Mojave)
  • Ubuntu 16.04

Supported Pythons:

  • 3.7 64-bit

Install

You can get miniconda from https://docs.conda.io/en/latest/miniconda.html if you don't have it, or install the dependencies from environment.yml manually.

git clone https://github.com/openai/train-procgen.git
conda env update --name train-procgen --file train-procgen/environment.yml
conda activate train-procgen
pip install https://github.com/openai/baselines/archive/9ee399f5b20cd70ac0a871927a6cf043b478193f.zip
pip install -e train-procgen

Try it out

Train an agent using PPO on the environment StarPilot:

python -m train_procgen.train --env_name starpilot

Train an agent using PPO on the environment StarPilot using the easy difficulty:

python -m train_procgen.train --env_name starpilot --distribution_mode easy

Run parallel training using MPI:

mpiexec -np 8 python -m train_procgen.train --env_name starpilot

Train an agent on a fixed set of N levels:

python -m train_procgen.train --env_name starpilot --num_levels N

Train an agent on the same 500 levels used in the paper:

python -m train_procgen.train --env_name starpilot --num_levels 500

Train an agent on a different set of 500 levels:

python -m train_procgen.train --env_name starpilot --num_levels 500 --start_level 1000

Run simultaneous training and testing using MPI. 1 in every 4 workers will be test workers, and the rest will be training workers.

mpiexec -np 8 python -m train_procgen.train --env_name starpilot --num_levels 500 --test_worker_interval 4

Train an agent using PPO on a level in Jumper that requires hard exploration

python -m train_procgen.train --env_name jumper --distribution_mode exploration

Train an agent using PPO on a variant of CaveFlyer that requires memory

python -m train_procgen.train --env_name caveflyer --distribution_mode memory

View training options:

python -m train_procgen.train --help

Reproduce and Visualize Results

Sample efficiency on hard environments (results/hard-all-runN):

mpiexec -np 4 python -m train_procgen.train --env_name ENV_NAME --distribution_mode hard
python -m train_procgen.graph --distribution_mode hard

Sample efficiency on easy environments (results/easy-all-runN):

python -m train_procgen.train --env_name ENV_NAME --distribution_mode easy
python -m train_procgen.graph --distribution_mode easy

Generalization on hard environments using 500 training levels (results/hard-500-runN):

mpiexec -np 8 python -m train_procgen.train --env_name ENV_NAME --num_levels 500 --distribution_mode hard --test_worker_interval 2
python -m train_procgen.graph --distribution_mode hard --restrict_training_set

Generalization on easy environments using 200 training levels (results/easy-200-runN):

mpiexec -np 2 python -m train_procgen.train --env_name ENV_NAME --num_levels 200 --distribution_mode easy --test_worker_interval 2
python -m train_procgen.graph --distribution_mode easy --restrict_training_set

Pass --normalize_and_reduce to compute and visualize the mean normalized return with train_procgen.graph.

Citation

Please cite using the following bibtex entry:

@article{cobbe2019procgen,
  title={Leveraging Procedural Generation to Benchmark Reinforcement Learning},
  author={Cobbe, Karl and Hesse, Christopher and Hilton, Jacob and Schulman, John},
  journal={arXiv preprint arXiv:1912.01588},
  year={2019}
}

train-procgen's People

Contributors

christopherhesse avatar kcobbe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

train-procgen's Issues

Cannot train with GPU

I am trying to train with tensorflow-gpu==1.14 and cuda and cudnn is loaded correctly, However, when tensorflow has finished loading it gets stuck here:
image
'
After some time, I get this error:

2021-04-15 20:14:04.341576: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(64, 256), b.shape=(256, 15), m=64, n=15, k=256
	 [[{{node ppo2_model/pi_1/MatMul}}]]
	 [[ppo2_model/ArgMax/_443]]
  (1) Internal: Blas GEMM launch failed : a.shape=(64, 256), b.shape=(256, 15), m=64, n=15, k=256
	 [[{{node ppo2_model/pi_1/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/hadi/Downloads/train-procgen/train_procgen/train.py", line 109, in <module>
    main()
  File "/home/hadi/Downloads/train-procgen/train_procgen/train.py", line 106, in main
    comm=comm)
  File "/home/hadi/Downloads/train-procgen/train_procgen/train.py", line 75, in train_fn
    max_grad_norm=0.5,
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/baselines/ppo2/ppo2.py", line 142, in learn
    obs, returns, masks, actions, values, neglogpacs, states, epinfos = runner.run() #pylint: disable=E0632
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/baselines/ppo2/runner.py", line 29, in run
    actions, values, self.states, neglogpacs = self.model.step(self.obs, S=self.states, M=self.dones)
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/baselines/common/policies.py", line 93, in step
    a, v, state, neglogp = self._evaluate([self.action, self.vf, self.state, self.neglogp], observation, **extra_feed)
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/baselines/common/policies.py", line 75, in _evaluate
    return sess.run(variables, feed_dict)
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/hadi/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(64, 256), b.shape=(256, 15), m=64, n=15, k=256
	 [[node ppo2_model/pi_1/MatMul (defined at /anaconda3/envs/train/lib/python3.7/site-packages/baselines/a2c/utils.py:63) ]]
	 [[ppo2_model/ArgMax/_443]]
  (1) Internal: Blas GEMM launch failed : a.shape=(64, 256), b.shape=(256, 15), m=64, n=15, k=256
	 [[node ppo2_model/pi_1/MatMul (defined at /anaconda3/envs/train/lib/python3.7/site-packages/baselines/a2c/utils.py:63) ]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node ppo2_model/pi_1/MatMul:
 ppo2_model/flatten_1/Reshape (defined at /anaconda3/envs/train/lib/python3.7/site-packages/baselines/common/policies.py:44)	
 ppo2_model/pi/w/read (defined at /anaconda3/envs/train/lib/python3.7/site-packages/baselines/a2c/utils.py:61)

Input Source operations connected to node ppo2_model/pi_1/MatMul:
 ppo2_model/flatten_1/Reshape (defined at /anaconda3/envs/train/lib/python3.7/site-packages/baselines/common/policies.py:44)	
 ppo2_model/pi/w/read (defined at /anaconda3/envs/train/lib/python3.7/site-packages/baselines/a2c/utils.py:61)

Original stack trace for 'ppo2_model/pi_1/MatMul':
  File "/anaconda3/envs/train/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/anaconda3/envs/train/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Downloads/train-procgen/train_procgen/train.py", line 109, in <module>
    main()
  File "/Downloads/train-procgen/train_procgen/train.py", line 106, in main
    comm=comm)
  File "/Downloads/train-procgen/train_procgen/train.py", line 75, in train_fn
    max_grad_norm=0.5,
  File "/anaconda3/envs/train/lib/python3.7/site-packages/baselines/ppo2/ppo2.py", line 109, in learn
    max_grad_norm=max_grad_norm, comm=comm, mpi_rank_weight=mpi_rank_weight)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/baselines/ppo2/model.py", line 37, in __init__
    act_model = policy(nbatch_act, 1, sess)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/baselines/common/policies.py", line 175, in policy_fn
    **extra_tensors
  File "/anaconda3/envs/train/lib/python3.7/site-packages/baselines/common/policies.py", line 49, in __init__
    self.pd, self.pi = self.pdtype.pdfromlatent(latent, init_scale=0.01)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/baselines/common/distributions.py", line 65, in pdfromlatent
    pdparam = _matching_fc(latent_vector, 'pi', self.ncat, init_scale=init_scale, init_bias=init_bias)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/baselines/common/distributions.py", line 355, in _matching_fc
    return fc(tensor, name, size, init_scale=init_scale, init_bias=init_bias)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/baselines/a2c/utils.py", line 63, in fc
    return tf.matmul(x, w)+b
  File "/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 2647, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5925, in mat_mul
    name=name)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/anaconda3/envs/train/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

It seems like this is a general issue with tf-1.14, so I am wondering how you guys had any luck with gpu training on this. I am training with command:
mpiexec --mca opal_cuda_support 1 -np 2 python -m train_procgen.train --env_name starpilot --num_levels 200 --distribution_mode easy --test_worker_interval 2

Scaling training when drawing unique assets each episode

I'm calling a custom version of images_load in resources.cpp each episode, which draws a unique pattern on some assets based on the level seed. My concern is that, during training, the sprites variable (map of shared pointers) may change as other envs call images_load (say num_envs=64) asynchronously while another is in the middle of drawing its assets, causing that episode to draw assets that it didn't intend to.

Is this possible? How might I prove/disprove this? If it is possible, what is a way to draw unique assets per episode and scale up training to 64 num_envs? At the moment, the best solution I can think of is to train with num_envs=1.

MPI error when running simultaneous training and testing

The code runs smoothly when using python -m train_procgen.train but throws an error when using MPI. The error log is as follows:

Logging to /tmp/procgen
creating environment
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 66543 RUNNING AT 2a1b7906b5b2
=   EXIT CODE: 134
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Can you help me? Thank you.

Plotting the episode reward

Hello,

Thank you so much for releasing the training code of your paper!

I am have a small question regarding the logging of the episode reward: from the baselines's ppo implementation, it seems that the rewards are computed for a fixed number of steps, and the environments are not resetted. For some Procgen tasks, such as maze in easy difficulty mode, this will result in a large reward at step 0, which is a litte weird from the plotting perspective.

I am wondering if the reward plots in the paper are plotted the same way? If not, would you be willing to share how do you plot them? Many thanks!

mpich

The docs should mention installing mpich (either through conda or homebrew) so that the mpi4py install succeeds.

Specifications for training/testing in easy mode

I was wondering how the curves of Figure 13 from the paper are obtained. The paper's Appendix D indicates that there is only one worker for the easy mode, but that would only provide us the "Train" results. Do you simply use "mpiexec np 2" and assign the second worker as a test_worker or is there another approach? Thanks.

Dependency on baseline

The repo has dependency on baseline but baseline is not included in environment.yml. I am just worried about the tf version. Baseline says it can support up to tf 1.14 while this repo uses tf 1.15. Will that cause problems?

cannot reproduce results on Coinrun.

I run the following command:
python -m train_procgen.train --env_name coinrun --num_levels 500
But I cannot reproduce the results as you've presented. Actually the final training score is around 5, but you report as over 7.
I attach my progress.csv in this issue. Can you help me out?
coinrun_progress.pdf

Reproducing + validating via graph

If I run python -m train_procgen.train --env_name bossfight --distribution_mode easy, the output is in a blocky form and not .csv. How do I log the output in the same format as the .csv files in results, so that I can graph them for comparison? Or, is there a better way to confirm that my training runs yield the same results as in results?

Also, I see in #6 and #11 there may be some extra parameters that need to be passed to reproduce the results more accurately -- how do I know which I need, if any?

GPU OOM when running on manually specified devices

When running with manually specified CUDA_VISIBLE_DEVICES variable, the code seems to use much more memory in the first gpu than others, leading to OOM error. It runs without errors when I don't manually set gpus.

Can you help me? Thank you.

Achieving reported training performance

We trained a few agents with the training code provided in repo. If don't change anything the mean reward in 500 training levels in Starpilot game is around 5.6. If we remove the VecNormalize line in environment creation, then we achieve mean reward around 9.2. However the reported mean reward is around 12 in the paper (Figure 4).

Did you use VecNormalize in the paper?

Training on custom environments + passing extra command line args for training

What's the best way to train this (or any) agent on an environment I created (or edited, such as CoinRun)? It looks like this code draws from procgen==0.9.2.

Also, I have a file that I would like to pass as an argument during training, like this:
python -m train_procgen.train --env_name bossfight --my_file file.txt. This file affects game behavior between episodes, so it should have direct access to coinrun.cpp, for example. Will there be any major complications implementing this as well?

Missing Code For Training on A Fixed Sequence of Levels

According to the blog, you also have an experiment about the ablation with deterministic levels. If I'm correct, you set the use_sequential_levels label as True. Is it possible for you to release the code for that experiment? Or could you please give an example about environment setup? Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.