Giter Site home page Giter Site logo

rezacdoobary / udacity---deep-reinforcement-learning---project-1---navigation Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 15.21 MB

This is a solution to the first project of the Udactiy deep reinforcement learning course.

Jupyter Notebook 98.79% Python 1.21%

udacity---deep-reinforcement-learning---project-1---navigation's Introduction

Udacity Reinforcement learning Nanodegree project 1 - Navigation

Introduction

This is the first project that is introduced in the reinforcement learning nanodegree offered by udacity.

Broadly speaking the goal of the project is to train an agent placed in a world of blue and yellow bananas, to maximise the number of yellow bananas whilst minimising the number of blue bananas obtained.

Project Description

Being more precise, the task of the project is the train an agent to collect as many yellow bananas as possible whilst avoiding the blue bananas.

  • The material goal of the task is reflected on the reward function by giving the agent +1 upon collecting a yellow banana, whilst giving the agent -1 if it collects a blue banana. This represents the reaction from the environment.

  • The state space of the agent is 37 dimensional and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction.

  • The space of actions of the agent can take is 4 dimensional and have physical interpretations, namely to move:

    • 0 - forwards
    • 1 - backwards
    • 2 - left
    • 3 - right
  • The task is deemed solved if the agent gets an average score of +13 over 100 consecutive episodes.

Setup

Code and result structure

There are three components to the solution. The first is the source code itself which implements the agent, the underlying model and further nessecary componenents. The second is the results folder, which contains the results of the training of the various models studied. Finally, the jupyter notebook named navigation.ipynb which acts as interface between the source code and the results folder, whilst itself displaying the results.

The detailed rundown is as follows:

  • The source code can be found in the folder \src and include four python files:

    • agent.py : Contains the complete agent implementation subject the underlying model chosen.
    • environment.py : Is a very rudimentary wrapper for the environment to make it feel a little more like the OpenAI environments.
    • model.py : The model implementations - in this case a neural network.
    • replaybuffer.py : The replay buffer implmentation for memory.
  • The results folder contains subfolders which are named according to the model chosen, for example if the model in question has hidden layeres [128,64,128], and drop out probability of 0.56, include dueling DQN's but no double DQNs the we name the folder in this precise order: \128_64_128_30_True_False. In this folder,

    • Are the checkpoints (checkpoint.pth) upon succesfully solving the task.
    • The corresponding scores against episodes plots. N.B. The precise models employed are will stated at the end and detailed in the attached report.pdf.
  • The interfacing jupyter notebook is considered the interface layer in which the user can decide on what precise architectures and models to use for the model. With each trained model, the scores against episodes is plotted, with the results forwared to the relevant results subfolder. The modelled agent can also be played from here to the see the solved task at work.

Models

  • The models studied in the project included two main improvement prescriptions. They are:
    • Double DQN
    • Dueling DQN

We consider these seperately and in combintaion in our analysis.

Below is displayed the playing policy trained with dueling DQN with architechture 64-128-64.

udacity---deep-reinforcement-learning---project-1---navigation's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.