Giter Site home page Giter Site logo

ppo-continuous-control's Introduction

Proximal Policy Optimization on a Unity Environment

Solution to the second project on the Udacity Deep Reinforcement Learning Course

Introduction

This repo contains my solution to the second project in the Udacity Deep Reinforcement Learning Course: Continuous Control. The Unity enviornment contains 20 robotic arms that all follow the same policy. Their task is to learn to track some moving globes with their end effectors.

A reward of 0.1 is given for each timestep that the end effector coincides with the target position. The state space consists of 33 real numbers, corresponding to position, rotation, velocities and angular velocities of the two arm links. The action spaces consists of 4 real numbers, corresponding to the torques applied to each of the two joints.

The trigger for the end of an episode is not documented, but appears to be around every 1000 time steps.

At the end of each episode, the average total reward for all the agents is taken. This average is then tracked from episode to episode. The trailing average of these averages over the last 100 episodes must exceed 30.0 for the environment to be considered solved.

Installation

To set up your python environment to run the code in this repository, follow the instructions below.

  1. Download and install Anaconda, if you don't already have it.

  2. Create (and activate) a new environment with Python 3.6.

    • Linux or Mac:
    conda create --name drlnd python=3.6
    source activate drlnd
    • Windows:
    conda create --name drlnd python=3.6 
    activate drlnd
  3. Clone the repository and navigate to the root folder. Then, install several dependencies.

git clone https://github.com/SimonBirrell/ppo-continuous-control-project.git
cd ppo-continuous-control-project
pip install .
  1. Create an IPython kernel for the drlnd environment.
python -m ipykernel install --user --name drlnd --display-name "drlnd"
  1. Run the notebook
jupyter notebook Navigation.ipynb
  1. Before running the code in the notebook, change the kernel to match the drlnd environment by using the drop-down Kernel menu. You should only need to do this the first time.

  2. Download the Unity environment from one of the links below. You need only select the environment that matches your operating system:

    (For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

  3. Place the file in the root directory of this repository and unzip (or decompress) the file.

  4. Run the code (by clicking the top cell and then using Shift-Enter to execute each cell in turn) right down to and including the "Train the Policy" cell. This should solve the environment in under 300 episodes. Make sure the Unity window is visible, as otherwise the simulation has atendency to pause itself.

  5. To see the trained agent, then run the cell titled "Run the policy". Watch the Unity app window to see how well it performs.

ppo-continuous-control's People

Contributors

simonbirrell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.