Giter Site home page Giter Site logo

benchmark-chainerrl-library-in-gym-environments's Introduction

Benchmark-ChainerRL-library-in-Gym-Environments

Benchmark ChainerRL library in OpenAI Gym Environments

Objectives

  • Benchmarking RL algorithms: Deterministic Policy Gradient DDPG, Trust Region Policy Optimization TRPO and Proximal Policy Optimization PPO algorithms.

OpenAI Gym Enviroment

  • OpenAI Gym Open source interface to reinforcement learning tasks. The gym library provides an easy-to-use suite of reinforcement learning tasks.

  • Open AI Gym has several environments, We Use classical control environments Pendulum and Bipedal Walker2D environmens.

OpenAI_Gym

Codes:

Observations

Pendelum

  • States: cosine and sine of angle between center and pendelum.

Bipedal Walker2D

  • 14 Observations: hull angle, hull angular velocity, hip joint angle, hip joint speed, knee joint angle, knee joint speed, etc

Actions

Pendelum

  • Joint effort

Bipedal Walker2D

  • 4 Actions: Hip_1 (Torque / Velocity), Hip_2 (Torque / Velocity), Knee_1 (Torque / Velocity) and Knee_2 (Torque / Velocity)

Reward

Pendelum

reward_fun

Bipedal Walker2D

  • 300+ points up to the far end. If the robot falls, it gets -100

Algorithms and Hyperparameters

  • DDPG is a model-free, off-policy actor-critic algorithm using deep function approximators that can learn policies in high-dimensional, continuous action spaces.DDPG is based on the deterministic policy gradient (DPG) algorithm. it combines the actor-critic approach with insights from the recent success of Deep Q Network (DQN).

  • PPO is a policy optimization method that use multiple epochs of stochastic gradient ascent to perform each policy update.

  • TRPO is a model free, on-policy optimization method that effective for optimizing large nonlinear policies such as neural networks.

Results

  • Pendelum
TRPO PPO DDPG
Mean Reward -1216 -1252 -594
Maximum Reward -986 -489 -371

Pendelum_result

  • Bipedal Walker2D
TRPO PPO DDPG
Mean Reward 120 163 -96
Maximum Reward 183 262 -25

Bipedal_results

Demo

  • Random Actions

Random

TRPO

PPO

DDPG

Discussion

  • DDPG algorithm achieves the best reward in Pendelum because it designed for high dimensions continuous space environments and it uses the replay buffer.

  • PPO and TRPO algorithms achieve the best reward in Bipedal Walker2D.

  • PPO Reachs the best reward faster than uses TRPO because it use gradient algorithm approximation instance of the conjugate. gradient algorithm.

Installing

Install OpenAI Gym Envirnment

pip3 install gym

Install ChainerRL libary

pip3 install chainerrl

benchmark-chainerrl-library-in-gym-environments's People

Contributors

montaserfath avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

basharbme

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.