Proximal Policy Optimization on a Unity Environment

Solution to the second project on the Udacity Deep Reinforcement Learning Course

Introduction

This repo contains my solution to the second project in the Udacity Deep Reinforcement Learning Course: Continuous Control. The Unity enviornment contains 20 robotic arms that all follow the same policy. Their task is to learn to track some moving globes with their end effectors.

A reward of 0.1 is given for each timestep that the end effector coincides with the target position. The state space consists of 33 real numbers, corresponding to position, rotation, velocities and angular velocities of the two arm links. The action spaces consists of 4 real numbers, corresponding to the torques applied to each of the two joints.

The trigger for the end of an episode is not documented, but appears to be around every 1000 time steps.

At the end of each episode, the average total reward for all the agents is taken. This average is then tracked from episode to episode. The trailing average of these averages over the last 100 episodes must exceed 30.0 for the environment to be considered solved.

Installation

To set up your python environment to run the code in this repository, follow the instructions below.

Download and install Anaconda, if you don't already have it.

Create (and activate) a new environment with Python 3.6.

Linux or Mac:

conda create --name drlnd python=3.6
source activate drlnd

Windows:

conda create --name drlnd python=3.6 
activate drlnd

Clone the repository and navigate to the root folder. Then, install several dependencies.

git clone https://github.com/SimonBirrell/ppo-continuous-control-project.git
cd ppo-continuous-control-project
pip install .

Create an IPython kernel for the drlnd environment.

python -m ipykernel install --user --name drlnd --display-name "drlnd"

Run the notebook

jupyter notebook Navigation.ipynb

Before running the code in the notebook, change the kernel to match the drlnd environment by using the drop-down Kernel menu. You should only need to do this the first time.
Download the Unity environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
Place the file in the root directory of this repository and unzip (or decompress) the file.
Run the code (by clicking the top cell and then using Shift-Enter to execute each cell in turn) right down to and including the "Train the Policy" cell. This should solve the environment in under 300 episodes. Make sure the Unity window is visible, as otherwise the simulation has atendency to pause itself.
To see the trained agent, then run the cell titled "Run the policy". Watch the Unity app window to see how well it performs.

simonbirrell / ppo-continuous-control Goto Github PK

ppo-continuous-control's Introduction

Proximal Policy Optimization on a Unity Environment

Introduction

Installation

ppo-continuous-control's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent