(D)RL Agent For PySC2 Environment

Introduction

Aim of this project is two-fold:

a.) Reproduce baseline DeepMind results by implementing RL agent (A2C) with neural network model architecture as close as possible to what is described in [1]. This includes embedding categorical (spatial-)features into continuous space with 1x1 convolution and multi-head policy, supporting actions with variable arguments (both spatial and non-spatial).

b.) Improve the results and/or sample efficiency of the baseline solution. Either with alternative algorithms (such as PPO [2]), using reduced set of features (unified across all mini-games) or alternative approaches, such as HRL [3] or Auxiliary Tasks [4].

Results

Map	This Agent	DeepMind
MoveToBeacon	26.3	26
CollectMineralShards	102	103
FindAndDefeatZerglings	43	45
DefeatRoaches	126*	100
DefeatZerglingsAndBanelings	197*	62
CollectMineralsAndGas	3340	3978
BuildMarines	0.55	3

* Unstable result with high std.dev (40 for DefeatRoaches and 120 for DefeatZerglingsAndBanelings)

A video of the trained agent on all minigames can be seen here: https://youtu.be/QdeObwCCxFI

Running

To train an agent, execute python main.py --envs=1 --map=MoveToBeacon.
To resume training from last checkpoint, specify --restore flag
To run in inference mode, specify --test flag
To change number of rendered environments, specify --render= flag
To change state/action space, specify path to a json config with --cfg_path=. The configuration with reduced feature space used to achieve some of the results above is:

{
  "feats": {
    "screen": ["visibility_map", "player_relative", "unit_type", "selected", "unit_hit_points_ratio", "unit_density"],
    "minimap": ["visibility_map", "camera", "player_relative", "selected"],
    "non_spatial": ["player", "available_actions"]
  }
}

Requirements

Python 3.x
Tensorflow >= 1.3
PySC2 with action spec fix

Good GPU and CPU are recommended, especially for full state/action space.

Related Work

Authors of xhujoy/pysc2-agents and pekaalto/sc2aibot were the first to attempt replicating [1] and their implementations were used as a general inspiration during development of this project, however their aim was more towards replicating results than architecture, missing key aspects, such as full feature and action space support. Authors of simonmeister/pysc2-rl-agents also aim to replicate both results and architecture, though their final goals seem to be in another direction. Their policy implementation was used as a loose reference for this project.

References

[1] StarCraft II: A New Challenge for Reinforcement Learning
[2] Proximal Policy Optimization Algorithms
[3] Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
[4] Reinforcement Learning with Unsupervised Auxiliary Tasks

ituco / pysc2-rl-agent Goto Github PK

pysc2-rl-agent's Introduction

(D)RL Agent For PySC2 Environment

Introduction

Results

Running

Requirements

Related Work

References

pysc2-rl-agent's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent