Deep Reinforcement Learning (DRL) Algorithms with PyTorch

This repository contains PyTorch implementations of deep reinforcement learning algorithms. The repository will soon be updated including the PyBullet environments!

Algorithms Implemented

Deep Q-Network (DQN) _{^{(V. Mnih et al. 2015)}}
Double DQN (DDQN) _{^{(H. Van Hasselt et al. 2015)}}
Advantage Actor Critic (A2C)
Vanilla Policy Gradient (VPG)
Natural Policy Gradient (NPG) _{^{(S. Kakade et al. 2002)}}
Trust Region Policy Optimization (TRPO) _{^{(J. Schulman et al. 2015)}}
Proximal Policy Optimization (PPO) _{^{(J. Schulman et al. 2017)}}
Deep Deterministic Policy Gradient (DDPG) _{^{(T. Lillicrap et al. 2015)}}
Twin Delayed DDPG (TD3) _{^{(S. Fujimoto et al. 2018)}}
Soft Actor-Critic (SAC) _{^{(T. Haarnoja et al. 2018)}}
SAC with automatic entropy adjustment (SAC-AEA) _{^{(T. Haarnoja et al. 2018)}}

Environments Implemented

Classic control environments (CartPole-v1, Pendulum-v0, etc.) _{^{(as described in here)}}
MuJoCo environments (Hopper-v2, HalfCheetah-v2, Ant-v2, Humanoid-v2, etc.) _{^{(as described in here)}}
PyBullet environments (HopperBulletEnv-v0, HalfCheetahBulletEnv-v0, AntBulletEnv-v0, HumanoidDeepMimicWalkBulletEnv-v1 etc.) _{^{(as described in here)}}

Results (MuJoCo, PyBullet)

MuJoCo environments

Hopper-v2

Observation space: 8
Action space: 3

HalfCheetah-v2

Observation space: 17
Action space: 6

Ant-v2

Observation space: 111
Action space: 8

Humanoid-v2

Observation space: 376
Action space: 17

PyBullet environments

HopperBulletEnv-v0

Observation space: 15
Action space: 3

HalfCheetahBulletEnv-v0

Observation space: 26
Action space: 6

AntBulletEnv-v0

Observation space: 28
Action space: 8

HumanoidDeepMimicWalkBulletEnv-v1

Observation space: 197
Action space: 36

Requirements

Usage

The repository's high-level structure is:

├── agents                    
    └── common 
├── results  
    ├── data 
    └── graphs        
└── save_model

1) To train the agents on the environments

To train all the different agents on PyBullet environments, follow these steps:

git clone https://github.com/dongminlee94/deep_rl.git
cd deep_rl
python run_bullet.py

For other environments, change the last line to run_cartpole.py, run_pendulum.py, run_mujoco.py.

If you want to change configurations of the agents, follow this step:

python run_bullet.py \
    --env=HumanoidDeepMimicWalkBulletEnv-v1 \
    --algo=sac-aea \
    --phase=train \
    --render=False \
    --load=None \
    --seed=0 \
    --iterations=200 \
    --steps_per_iter=5000 \
    --max_step=1000 \
    --tensorboard=True \
    --gpu_index=0

2) To watch the learned agents on the above environments

To watch all the learned agents on PyBullet environments, follow these steps:

python run_bullet.py \
    --env=HumanoidDeepMimicWalkBulletEnv-v1 \
    --algo=sac-aea \
    --phase=test \
    --render=True \
    --load=envname_algoname_... \
    --seed=0 \
    --iterations=200 \
    --steps_per_iter=5000 \
    --max_step=1000 \
    --tensorboard=False \
    --gpu_index=0

You should copy the saved model name in save_model/envname_algoname_... and paste the copied name in envname_algoname_.... So the saved model will be load.

neilwth / deep_rl Goto Github PK

deep_rl's Introduction