Giter Site home page Giter Site logo

dgriff777 / a3c_continuous Goto Github PK

View Code? Open in Web Editor NEW
255.0 10.0 59.0 63.56 MB

A continuous action space version of A3C LSTM in pytorch plus A3G design

License: Apache License 2.0

Python 100.00%
a3c a3c-lstm pytorch openai-gym pytorch-a3c a3c-gpu a3g

a3c_continuous's Introduction

*Update: Major update providing large training performance gains as well as code working with latest versions of pytorch and gym libraries. With updated code now possible to train a successful model that can avg 300+ on BipedalWalkerHardcore-v3 env in just 20-40mins using just CPU!!

  • A3G A NEW GPU/CPU ARCHITECTURE OF A3C FOR SUBSTANTIALLY ACCELERATED TRAINING!! Training with A3G benefits training speed most when using larger models i.e using raw pixels for observations such as training in atari environments that have raw pixels for state representation

RL A3C Pytorch Continuous

A3C LSTM playing BipedalWalkerHardcore-v2

This repository includes my implementation with reinforcement learning using Asynchronous Advantage Actor-Critic (A3C) in Pytorch an algorithm from Google Deep Mind's paper "Asynchronous Methods for Deep Reinforcement Learning."

A3G!!

New implementation of A3C that utilizes GPU for speed increase in training. Which we can call A3G. A3G as opposed to other versions that try to utilize GPU with A3C algorithm, with A3G each agent has its own network maintained on GPU but shared model is on CPU and agent models are quickly converted to CPU to update shared model which allows updates to be frequent and fast by utilizing Hogwild Training and make updates to shared model asynchronously and without locks. This new method greatly increase training speed and models and can be see in my rl_a3c_pytorch repo that training that use to take days to train can be trained in as fast as 10minutes for some Atari games!

A3C LSTM

This is continuous domain version of my other a3c repo. Here I show A3C can solve BipedalWalker-v3 but also the much harder BipedalWalkerHardcore-v3 version as well. "Solved" meaning to train a model capable of averaging reward over 300 for 100 consecutive episodes

Requirements

  • Python 3.7+
  • openai gym<=0.26.0
  • Pytorch
  • spdlog (Is a much faster logging library than the standard python logging library)
  • setproctitle

Training

When training model it is important to limit number of worker processes to number of cpu cores available as too many processes (e.g. more than one process per cpu core available) will actually be detrimental in training speed and effectiveness

To train agent in BipedalWalker-v3 environment with 6 different worker processes: On a MacPro 2014 laptop traing typically takes less than 5mins to converge to a winning solution

python main.py --env BipedalWalker-v3 --optimizer Adam --shared-optimizer --workers 6 --amsgrad -sws -m3c -tl

Graph of training run for BipedalWalker-v3 Graph showing training a BipedalWalker-v3 agent with the above command on Macbook pro. Train a successful model in 10mins on your laptop!

To tail training log for above command use the following command:

tail -f logs/BipedalWalker-v3_log

To train agent in BipedalWalkerHardcore-v3 environment with 18 different worker processes: BipedalWalkerHardcore-v3 is much harder environment compared to normal BipedalWalker Training a successful model than can achieve a 300+ avg reward on 100 episode test typical takes 20-40mins

python main.py --env BipedalWalkerHardcore-v3 --optimizer Adam --shared-optimizer --workers 18 --amsgrad -sws -m3c

To tail training log for above command use the following command:

tail -f logs/BipedalWalkerHardcore-v3_log

Hit Ctrl C to end training session properly

A3C LSTM playing BipedalWalkerHardcore-v3

Evaluation

To run a 100 episode gym evaluation with trained model

python gym_eval.py --env BipedalWalkerHardcore-v3 --num-episodes 100

Project Reference

a3c_continuous's People

Contributors

dgriff777 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

a3c_continuous's Issues

How to save the best policy

Hi, @dgriff777 , I new to RF learning. I saved a 'best model' through your code(main.py in A3C_continous), but when use the "best policy" to evaluate , it's reward fluctuating violently.

I guess the saved policy actually is not the truly "best policy" but only the best in a special episode during training) , Is there any alternative way to save a stable and truly "best policy".
Thank you in advance!

How to understand test.py?

Hi @dgriff777 , thanks for your code firstly. I am learning your code https://github.com/dgriff777/a3c_continuous. And when I learning the test.py, I don't understand this file's function. In addition, why put the test function into Process firstly? What I mean is that "p = mp.Process(target=test, args=(-1, args, shared_model))". Could you give some guidance? Thanks

possible memory leak

Hello,

Firstly, thanks for open-sourcing your code.

I recognised that memory consumption per CPU core keeps increasing, did you observe a similar case.

I use Python 3.6 with 8 workers on Ubuntu 16.04.
No GPU is enabled.

Thanks!

Issue when training agent

Hello !

Thanks for the code.

I have an issue when running the command "python main.py --workers 6 --env BipedalWalker-v2 --save-max True --model MLP --stack-frames 1" :

Ghostyn@DESKTOP-BBU0957:~/a3c_continuous$ python main.py --workers 6 --env BipedalWalker-v2 --save-max True --model MLP --stack-frames 1
Traceback (most recent call last):
File "main.py", line 12, in
from gym.configuration import undo_logger_setup
ModuleNotFoundError: No module named 'gym.configuration'

Can you help ?

Adrien

setproctitle missing from requirements section

Hi thanks for the work. I've noticed that the requirements sections is not mentioning anything about setproctitle package. Maybe you should try to added to the requirements section?

learn from 2D pixel array / conv2D

Hi @dgriff777 , thanks for your repo. Was wondering if you might be able/interested to add a 2D convolution model (i.e. using Conv2D as opposed to Conv1D like is in A3C_CONV today) to learn from pixel array as opposed to learn from 1D parameters? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.