Solution to the second project on the Udacity Deep Reinforcement Learning Course
This repo contains my solution to the second project in the Udacity Deep Reinforcement Learning Course: Continuous Control. The Unity enviornment contains 20 robotic arms that all follow the same policy. Their task is to learn to track some moving globes with their end effectors.
A reward of 0.1 is given for each timestep that the end effector coincides with the target position. The state space consists of 33 real numbers, corresponding to position, rotation, velocities and angular velocities of the two arm links. The action spaces consists of 4 real numbers, corresponding to the torques applied to each of the two joints.
The trigger for the end of an episode is not documented, but appears to be around every 1000 time steps.
At the end of each episode, the average total reward for all the agents is taken. This average is then tracked from episode to episode. The trailing average of these averages over the last 100 episodes must exceed 30.0 for the environment to be considered solved.
To set up your python environment to run the code in this repository, follow the instructions below.
-
Download and install Anaconda, if you don't already have it.
-
Create (and activate) a new environment with Python 3.6.
- Linux or Mac:
conda create --name drlnd python=3.6 source activate drlnd
- Windows:
conda create --name drlnd python=3.6 activate drlnd
-
Clone the repository and navigate to the root folder. Then, install several dependencies.
git clone https://github.com/SimonBirrell/ppo-continuous-control-project.git
cd ppo-continuous-control-project
pip install .
- Create an IPython kernel for the
drlnd
environment.
python -m ipykernel install --user --name drlnd --display-name "drlnd"
- Run the notebook
jupyter notebook Navigation.ipynb
-
Before running the code in the notebook, change the kernel to match the
drlnd
environment by using the drop-downKernel
menu. You should only need to do this the first time. -
Download the Unity environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
-
Place the file in the root directory of this repository and unzip (or decompress) the file.
-
Run the code (by clicking the top cell and then using Shift-Enter to execute each cell in turn) right down to and including the "Train the Policy" cell. This should solve the environment in under 300 episodes. Make sure the Unity window is visible, as otherwise the simulation has atendency to pause itself.
-
To see the trained agent, then run the cell titled "Run the policy". Watch the Unity app window to see how well it performs.