Light

neka-nat / distributed_rl Goto Github PK

View Code? Open in Web Editor NEW

70.0 6.0 8.0 995 KB

Pytorch implementation of distributed deep reinforcement learning

License: MIT License

Python 93.79% Shell 1.95% Dockerfile 1.80% HCL 2.46%

reinforcement-learning distributed-systems pytorch deep-q-network amazon-web-services prioritized-experience-replay dueling-dqn double-dqn openai-gym ape-x

distributed_rl's Introduction

distributed_rl

This is pytorch implementation of distributed deep reinforcement learning.

ape-x
- Distributed Prioritized Experience Replay
r2d2 (Recurrent Replay Distributed DQN)(experimental)
- Recurrent Experience Replay in Distributed Reinforcement Learning

System

In our system, there are two processes, Actor and Learner. In Learner process, thread of the replay memory runs at the same time, and these processes communicate using Redis.

Install

git clone https://github.com/neka-nat/distributed_rl.git
cd distributed_rl
poetry install

Install redis-server.

sudo apt-get install redis-server

Setting Atari. https://github.com/openai/atari-py#roms

Run

The following command is running all actors and learner in localhost. The number of actor's processes is given as an argument.

poetry shell
./run.sh 4

Run r2d2 mode.

./run.sh 4 config/all_r2d2.conf

Docker build

cd distributed_rl
docker-compose up -d

Use EKS

Create EKS resource.

cd terraform
terraform init
terraform plan
terraform apply

distributed_rl's People

Contributors

Stargazers

Watchers

Forkers

yeshg tr19006 afcarl its-gucci roothyb colllin will-nie micbosi

distributed_rl's Issues

R2D2 not converging?

Hi,

I am running your model on Pong and it doesn't seem like the R2D2 model is converging at all? In contrast, your Ape-X implementation works and starts converging nicely after 2-3 hours.

Here your R2D2 implementation results after training for 32 hours on an 1080 TI with 4 workers:

Note there are various items in your implementation that are different from the papers for both Ape-X and R2D2, such as worker epsilons being below 0.4 and always constant (which has a significant impact on convergence speed) , or the DM R2D2 model taking as additional input the last action and last reward.

Did you manage to get any convergence yourself? If so, how can I replicate it?

Can ape-x implementation run on multiple machines ?

It seems that actors are running on multiple cpu cores on a single machine.
Can they run on multiple machines ?
Thanks in advance.

why nx_st = np.maximum(self._nx_st, self._cur_st) if self._gray else self._nx_st？ Thanks

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.