Giter Site home page Giter Site logo

stevenson0421 / openai-baselines Goto Github PK

View Code? Open in Web Editor NEW

This project forked from openai/baselines

0.0 0.0 0.0 6.48 MB

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

License: MIT License

Python 51.52% HTML 48.45% Dockerfile 0.04%

openai-baselines's Introduction

Baselines

OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.

These algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones.

Subpackages

Prerequisites

Ubuntu

sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev

Mac OS X

Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the following:

brew install cmake openmpi

Virtual environment

From the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via

pip install virtualenv

Virtualenvs are essentially folders that have copies of python executable and all python packages. To create a virtualenv called venv with python3, one runs:

virtualenv /path/to/venv --python=python3

To activate a virtualenv:

. /path/to/venv/bin/activate

More thorough tutorial on virtualenvs and options can be found here

Python versions

Recommended Python version is 3.7.15

Tensorflow versions

The master branch supports Tensorflow from version 1.4 to 1.14. For Tensorflow 2.0 support, please use tf2 branch.

Installation

  • Clone the repo and cd into it:

    git clone https://github.com/openai/baselines.git
    cd baselines
  • If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases, you may use

    pip install tensorflow-gpu==1.14 # if you have a CUDA-compatible gpu and proper drivers
    conda install cudatoolkit=10.0.130 cudnn=7.6.5

    or

    pip install tensorflow==1.14

    to install Tensorflow 1.14, which is the latest version of Tensorflow supported by the master branch. Refer to TensorFlow installation guide for more details.

  • Install baselines package

    pip install -e .
    pip install matplotlib pandas gym[atari] filelock
    conda install ffmpeg
  • Install atari Roms from here and extract the .rar file. After that, run:

    python -m atari_py.import_roms <path to folder>

Testing the installation

All unit tests in baselines can be run using pytest runner:

pip install pytest
pytest

Training models

Most of the algorithms in baselines repo are used as follows:

python -m baselines.run --alg=<name of the algorithm> --env=<environment_id> [additional arguments]

Example 1. PPO with MuJoCo Humanoid

For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7

Note that for mujoco environments fully-connected network is default, so we can omit --network=mlp The hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance:

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy

will set entropy coefficient to 0.1, and construct fully connected network with 3 layers with 32 hidden units in each, and create a separate network for value function estimation (so that its parameters are not shared with the policy network, but the structure is the same)

See docstrings in common/models.py for description of network parameters for each type of model, and docstring for baselines/ppo2/ppo2.py/learn() for the description of the ppo2 hyperparameters.

Example 2. DQN on Atari

DQN with Atari is at this point a classics of benchmarks. To run the baselines implementation of DQN on Atari Pong:

python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6

Saving, loading and visualizing models

Saving and loading the model

The algorithms serialization API is not properly unified yet; however, there is a simple method to save / restore trained models. --save_path and --load_path command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively. Let's imagine you'd like to train ppo2 on Atari Pong, save the model and then later visualize what has it learnt.

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2

This should get to the mean reward per episode about 20. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize:

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --load_path=~/models/pong_20M_ppo2 --play

Logging and vizualizing learning curves and other training metrics

By default, all summary data, including progress, standard output, is saved to a unique directory in a temp folder, specified by a call to Python's tempfile.gettempdir(). The directory can be changed with the --log_path command-line option.

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2 --log_path=~/logs/Pong/

NOTE: Please be aware that the logger will overwrite files of the same name in an existing directory, thus it's recommended that folder names be given a unique timestamp to prevent overwritten logs.

Another way the temp directory can be changed is through the use of the $OPENAI_LOGDIR environment variable.

For examples on how to load and display the training data, see here.

Trace Code

  • main structure

    baselines/run.py
    main()->train()
        baselines/ppo2/ppo2.py
        learn()
            baselines/ppo2/model.py
            Model()
  • common arguments (defined here)

    1. env: environment ID, default='Reacher-v2'
    2. env_type: type of environment, used when the environment type cannot be automatically determined
    3. seed: RNG seed, default=None
    4. alg: Algorithm, default='ppo2'
    5. num_timesteps: default=1e6
    6. network: policy network type (mlp, cnn, lstm, cnn_lstm, conv_only), default=None
    7. gamestate: game state to load (so far only used in retro games)
    8. num_env: Number of environment copies being run in parallel. When not specified, set to number of cpus for Atari, and to 1 for Mujoco, default=None
    9. reward_scale: Reward scale factor, default=1.0
    10. save_path: Path to save trained model to, default=None
    11. save_video_interval: Save video every x steps (0 = disabled), default=0
    12. save_video_length: Length of recorded video, default=200
    13. log_path: Directory to save learning curve data, default=None
    14. play: flag for visualization, default=False
  • default policy network is set to cnn in baselines/run.get_default_network(), while the paper use cnn: [[16, 8, 4], [32, 4, 2]] + fc [256].

  • POME adds two additional network, which are reward network and transition network

  • The main algorithm is implemented in baselines/ppo2/model.py

  • For reference, baselines/deepq/experiments/custom_cartpole.py builds a customized framework for DQN

openai-baselines's People

Contributors

pzhokhov avatar joschu avatar andrewliao11 avatar siemanko avatar christopherhesse avatar stevenson0421 avatar unixpickle avatar zuoxingdong avatar joshim5 avatar tanzhenyu avatar matthiasplappert avatar matpoliquin avatar aureliantactics avatar 20chase avatar yenchenlin avatar iamhatesz avatar jacobhilton avatar shakenes avatar whyjay avatar xingyousong avatar williamjqk avatar simoninithomas avatar atcold avatar machinaut avatar girving avatar gyunt avatar sritee avatar olegklimov avatar mrahtz avatar louiehelm avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.