Giter Site home page Giter Site logo

anjum48 / rl-examples Goto Github PK

View Code? Open in Web Editor NEW
101.0 9.0 26.0 17.38 MB

Examples of published reinforcement learning algorithms in recent literature implemented in TensorFlow

License: MIT License

Python 99.14% Shell 0.86%
reinforcement-learning artificial-intelligence tensorflow python openai-gym

rl-examples's Introduction

rl-examples

Examples of published reinforcement learning algorithms in recent literature implemented in TensorFlow. Most of my research is in the continuous domain, and I haven't spent much time testing these in discrete domains such as Atari etc.

PPO LSTM solving BipedalWalker-v2 PPO solving CarRacing-v0

BipedalWalker-v2 solved using DPPO with a LSTM layer. CarRacing-v0 solved using PPO with a joined actor-critic network

Algorithms Implemented

Thanks to DeepMind and OpenAI for making their research openly available. Big thanks also to the TensorFlow community.

Algorithm Paper
DPPG Continuous control with deep reinforcement learning
A3C Asynchronous Methods for Deep Reinforcement Learning
PPO Proximal Policy Optimization Algorithms
DPPO Emergence of Locomotion Behaviours in Rich Environments
GAE High-Dimensional Continuous Control Using Generalized Advantage Estimation
  • GAE was used in all algorithms except for DPPG
  • Where possible, I've added an LSTM layer to the policy and value functions. This sometimes achieved higher scores in some environments, but can have stability issues
  • In some environments, having a joint network for the actor & critic performs better (i.e. where CNNs are used). These scripts are suffixed, e.g. ppo_joined.py

Training

All the Python scripts are written as standalone scripts (but share some common functions in utils.py). Just run them directly in your IDE. Or in a terminal using the -m flag:

rl-examples$ python3 -m ppo.ppo_joined

The models and TensorBoard summaries are saved in the same directory as the script. DPPO has a helper script to set off the worker threads:

rl-examples$ sh dppo/start_dppo.sh

Requirements

  • Python 3.6+
  • OpenAI Gym 0.10.3+
  • TensorFlow 1.11
  • Numpy 1.13+

DPPO was tested on a 16 core machine using CPU only, so the helper script will need to be updated for your particular setup. For my setup, there was usually no speed advantage training BipedalWalker on the CPU vs GPU (GTX 1080), but CarRacing did get a performance boost due to the usage of CNN layers

Issues/Todo's

  • Work needed to find the correct parameters for PPO in discrete action spaces for Atari
  • The LSTM batching in A3C is incorrect. Need to fix this (see ppo_lstm.py for the correct implementation)
  • Distributed Proximal Policy Optimisation with the LSTM (dppo_lstm.py) is sometimes a bit unstable, but does work at low learning rates

rl-examples's People

Contributors

anjum48 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-examples's Issues

Experimental details on CarRacing

Hi, thank you for your great works. This is the best repo that beats CarRacing.
I thought CarRacing results best performance with ppo_lstm_joined.py, but how long did it take to train? Is number of episode of 10000 enough? I just want to make sure before training it myself....

RunningStats is not defined

Throughout the PPO examples, 'RunningStats()' is called despite never being defined. This prevents the scripts from running as is, and I'm not sure where it is coming from.

No utils package

Hi, when I try to run, there are no package utils, could you please add this package? thanks

action_space.low

I found that the bound was defined by action_space.high like this:
self.a_bound = environment.action_space.high

But for the carracing-v0, there are both high and low:
action_space.high = [1. 1. 1.]
action_space.low = [-1. 0. 0.]

distutils.errors.CompileError: command 'gcc' failed with exit status 1

/home/balavivek/anaconda3/bin/python "/home/balavivek/Bala/SEM2/High-Dimensional Continuous Control Using Generalized Advantage Estimation/ppo/ppo.py"
Import error. Trying to rebuild mujoco_py.
running build_ext
building 'mujoco_py.cymj' extension
gcc -pthread -B /home/balavivek/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py -I/home/balavivek/.mujoco/mjpro150/include -I/home/balavivek/anaconda3/lib/python3.6/site-packages/numpy/core/include -I/home/balavivek/anaconda3/include/python3.6m -c /home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/cymj.c -o /home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_1.50.1.56_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/cymj.o -fopenmp -w
gcc -pthread -B /home/balavivek/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py -I/home/balavivek/.mujoco/mjpro150/include -I/home/balavivek/anaconda3/lib/python3.6/site-packages/numpy/core/include -I/home/balavivek/anaconda3/include/python3.6m -c /home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.c -o /home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_1.50.1.56_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.o -fopenmp -w
/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.c:1:23: fatal error: GL/osmesa.h: No such file or directory
#include <GL/osmesa.h>
^
compilation terminated.
Traceback (most recent call last):
File "/home/balavivek/anaconda3/lib/python3.6/distutils/unixccompiler.py", line 118, in _compile
extra_postargs)
File "/home/balavivek/anaconda3/lib/python3.6/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/home/balavivek/anaconda3/lib/python3.6/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/home/balavivek/anaconda3/lib/python3.6/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/balavivek/Bala/SEM2/High-Dimensional Continuous Control Using Generalized Advantage Estimation/ppo/ppo.py", line 202, in
env = gym.make(ENVIRONMENT)
File "/home/balavivek/gym/gym/envs/registration.py", line 167, in make
return registry.make(id)
File "/home/balavivek/gym/gym/envs/registration.py", line 119, in make
env = spec.make()
File "/home/balavivek/gym/gym/envs/registration.py", line 85, in make
cls = load(self._entry_point)
File "/home/balavivek/gym/gym/envs/registration.py", line 14, in load
result = entry_point.load(False)
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2324, in load
return self.resolve()
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2330, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/home/balavivek/gym/gym/envs/mujoco/init.py", line 1, in
from gym.envs.mujoco.mujoco_env import MujocoEnv
File "/home/balavivek/gym/gym/envs/mujoco/mujoco_env.py", line 11, in
import mujoco_py
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/init.py", line 1, in
from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/builder.py", line 468, in
cymj = load_cython_ext(mjpro_path)
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/builder.py", line 90, in load_cython_ext
cext_so_path = builder.build()
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/builder.py", line 202, in build
built_so_file_path = self._build_impl()
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/builder.py", line 256, in _build_impl
so_file_path = super()._build_impl()
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/builder.py", line 225, in _build_impl
dist.run_commands()
File "/home/balavivek/anaconda3/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/home/balavivek/anaconda3/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/home/balavivek/anaconda3/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/mujoco_py/builder.py", line 125, in build_extensions
build_ext.build_extensions(self)
File "/home/balavivek/anaconda3/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
self.build_extension(ext)
File "/home/balavivek/anaconda3/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/home/balavivek/anaconda3/lib/python3.6/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/home/balavivek/anaconda3/lib/python3.6/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command 'gcc' failed with exit status 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.