State-of-the-art Model-free Reinforcement Learning Algorithms

PyTorch and Tensorflow 2.0 implementation of state-of-the-art model-free reinforcement learning algorithms on both Openai gym environments and a self-implemented Reacher environment.

Algorithms include Soft Actor-Critic (SAC), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt (including Cross-entropy (CE) Method), PointNet, Transporter, Recurrent Policy Gradient, etc.

This repo only contains PyTorch Implementation.

Here is my Tensorflow 2.0 + Tensorlayer 2.0 implementation as tutorials with simple structures. And here is a baseline implementation with high-level API supporting a variety of popular environments, also with Tensorflow 2.0 + Tensorlayer 2.0.

Two versions of Soft Actor-Critic (SAC) are implemented.

SAC Version 1:

sac.py: using state-value function.

paper: https://arxiv.org/pdf/1801.01290.pdf

SAC Version 2:

sac_v2.py: using target Q-value function instead of state-value function.

paper: https://arxiv.org/pdf/1812.05905.pdf
Deep Deterministic Policy Gradient (DDPG):

ddpg.py: implementation of DDPG.
Twin Delayed DDPG (TD3):

td3.py: implementation of TD3.

paper: https://arxiv.org/pdf/1802.09477.pdf
Proximal Policy Optimization (PPO): Todo
Actor-Critic (AC) / A2C:

ac.py: extensible AC/A2C, easy to change to be DDPG, etc.

A very extensible version of vanilla AC/A2C, supporting for all continuous/discrete deterministic/non-deterministic cases.
Two versions of QT-Opt are implemented here.
PointNet for landmarks generation from images with unsupervised learning is implemented here. This method is also used for image-based reinforcement learning as a SOTA algorithm, called Transporter.

original paper: Unsupervised Learning of Object Landmarksthrough Conditional Image Generation

paper for RL: Unsupervised Learning of Object Keypointsfor Perception and Control
Recurrent Policy Gradient:

rdpg.py: DDPG with LSTM policy.

td3_lstm.py: TD3 with LSTM policy.

sac_v2_lstm.py: SAC with LSTM policy.

References:

Memory-based control with recurrent neural networks

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
Maximum a Posteriori Policy Optimisation (MPO):

todo

paper: Maximum a Posteriori Policy Optimisation
Advantage-Weighted Regression (AWR):

todo

paper: Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

Usage:

python ***.py --train

python ***.py --test

Troubleshooting:

If you meet problem "Not imlplemented Error", it may be due to the wrong gym version. The newest gym==0.14 won't work. Install gym==0.7 or gym==0.10 with pip install -r requirements.txt.

Performance:

SAC for gym Pendulum-v0:

SAC with automatically updating variable alpha for entropy:

SAC without automatically updating variable alpha for entropy:

It shows that the automatic-entropy update helps the agent to learn faster.

TD3 for gym Pendulum-v0:

TD3 with deterministic policy:

TD3 with non-deterministic/stochastic policy:

It seems TD3 with deterministic policy works a little better, but basically similar.

AC for gym CartPole-v0:

However, vanilla AC/A2C cannot handle the continuous case like gym Pendulum-v0 well.

Citation:

To cite this repository:

@misc{rlalgorithms,
  author = {Zihan Ding},
  title = {SOTA-RL-Algorithms},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/quantumiracle/SOTA-RL-Algorithms}},
}

edengabriel / sota-rl-algorithms Goto Github PK

sota-rl-algorithms's Introduction

State-of-the-art Model-free Reinforcement Learning Algorithms

Contents:

Usage:

Troubleshooting:

Performance:

Citation:

sota-rl-algorithms's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent