Giter Site home page Giter Site logo

gkswamy98 / dqfd Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yudasong/dqfd

0.0 0.0 0.0 18.35 MB

An implementation of Deep Q-Learning from Demonstrations (DQfD) for playing Atari 2600 video games

License: MIT License

Python 79.93% Cython 20.07%

dqfd's Introduction

Deep Q-Learning from Demonstrations (DQfD)

This repository contains an implementation of the learning algorithm proposed in Deep Q-Learning from Demonstrations (Hester et al. 2018) for solving Atari 2600 video games using a combination of reinforcement learning and imitation learning techniques.

Note: The implementation is part of my Bachelor's Thesis Tiefes Q-Lernen mit Demonstrationen.

Table of Contents

Features

Getting Started

Installation

1. Clone the repository

In order to clone the repository, open your terminal, move to the directory in which you want to store the project and type

$ git clone https://github.com/felix-kerkhoff/DQfD.git

2. Create a virtual environment

The installation of the GPU-Version of TensorFlow with the proper Nvidia driver and Cuda libraries from source can be quite tricky. I recommend using Anaconda/Miniconda to create a virtual environment for the installation of the necessary packages, as conda will automatically install the right Cuda libraries. So type

$ conda create --name atari_env

to create an environment called atari_env. If you already have a working TensorFlow 2 installation, you can of course also use venv and pip to create the virtual environment and install the packages.

3. Install the required packages

To install the packages, we first have to activate the environment by typing:

$ conda activate atari_env

Then install the necessary packages specified in requirements.txt by using the following command in the directory of your project:

$ conda install --file requirements.txt -c conda-forge -c powerai

Note:

  • If you want to use pip for the installation, you will need to make the following changes to the requirements.txt file:

    • replace the line atari_py==0.2.6 by atari-py==0.2.6
    • replace the line opencv==4.4.0 by opencv-python==4.4.0
  • For being able to compile the Cython modules, make sure to have a proper C/C++ compiler installed. See the Cython Documentation for further information.

Usage

To see if everything works fine, I recommend training your first agent on the game Pong as this game needs the least training time. Therefor just run the following command in your terminal (in the directory of your project):

$ python pong_standard_experiment.py

You should be seeing good results after about 150,000 training steps which corresponds to about 15 minutes of computation time on my machine. By using n_step = 50 instead of n_step = 10 as the number of steps considered for the n-step loss, you can even speed up the process to get good results after less than 100,000 training steps or 10 minutes (see the experiment in the next section). Feel free to experiment with all the other parameters and games by changing them in the respective file.

Some Experiments

Different numbers n for the n-step loss in the game Pong

With the first experiment, we try to show how the use of multi-step losses can speed up the training process in the game Pong.

Ablations in the game Enduro

In the next experiment we investigate the influence of the different components of n-step Prioritized Dueling Double Deep Q-Learning using the example of the game Enduro. We will do this by leaving out exactly one of the components and keeping all other parameters unchanged.

Using demonstrations to learn Montezuma's Revenge

Due to very sparse rewards and the need of long-term planning, Montezuma's Revenge is known to be one of the most difficult Atari 2600 games to solve for deep reinforcement learning agents, such that most of them fail in this game. The use of human demonstrations might help to overcome this issue:

Note:

  • The figures show the number of steps (i.e. the number of decisions made by the agent) on the x-axis and the scores achieved during the training process on the y-axis. The learning curves are smoothed using the moving average over intervals of 50 episodes and the shaded areas correspond to the standard deviance within these intervals.
  • The learning curves were produced using the parameters (except of the ones that were considered in the experiments such as n_step in the first experiment) specified in the files pong_standard_experiment.py, enduro_standard_experiment.py and montezuma_demo_experiment.py.

Task List

License

This project is licensed under the terms of the MIT license.

Copyright (c) 2020 Felix Kerkhoff

dqfd's People

Contributors

felix-kerkhoff avatar gkswamy98 avatar yudasong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.