ch.alpine.subare

Library for reinforcement learning in Java 17.

Repository includes algorithms, examples, and exercises from the 2nd edition of Reinforcement Learning: An Introduction by Richard S. Sutton, and Andrew G. Barto.

Our implementation is inspired by the python code by Shangtong Zhang, but differs from the reference in two aspects:

the algorithms are implemented separate from the problem scenarios
the math is in exact precision which reproduces symmetries in the results in case the problem features symmetries

Algorithms

Iterative Policy Evaluation (parallel, in 4.1, p.59)
Value Iteration to determine V*(s) (parallel, in 4.4, p.65)
Action-Value Iteration to determine Q*(s,a) (parallel)
First Visit Policy Evaluation (in 5.1, p.74)
Monte Carlo Exploring Starts (in 5.3, p.79)
Contant-alpha Monte Carlo
Tabular Temporal Difference (in 6.1, p.96)
Sarsa: An on-policy TD control algorithm (in 6.4, p.104)
Q-learning: An off-policy TD control algorithm (in 6.5, p.105)
Expected Sarsa (in 6.6, p.107)
Double Sarsa, Double Expected Sarsa, Double Q-Learning (in 6.7, p.109)
n-step Temporal Difference for estimating V(s) (in 7.1, p.115)
n-step Sarsa, n-step Expected Sarsa, n-step Q-Learning (in 7.2, p.118)
Random-sample one-step tabular Q-planning (parallel, in 8.1, p.131)
Tabular Dyna-Q (in 8.2, p.133)
Prioritized Sweeping (in 8.4, p.137)
Semi-gradient Tabular Temporal Difference (in 9.3, p.164)
True Online Sarsa (in 12.8, p.309)

Gallery

Prisoner's Dilemma

Exact Gambler

Examples

4.1 Gridworld

AV-Iteration q(s,a)

TabularQPlan

Monte Carlo

Q-Learning

Expected-Sarsa

Sarsa

3-step Q-Learning

3-step E-Sarsa

3-step Sarsa

OTrue Online Sarsa

ETrue Online Sarsa

QTrue Online Sarsa

4.2: Jack's car rental

Value Iteration v(s)

4.4: Gambler's problem

Value Iteration v(s)

Action Value Iteration and optimal policy

Monte Carlo q(s,a)

ESarsa q(s,a)

QLearning q(s,a)

5.1 Blackjack

Monte Carlo Exploring Starts

5.2 Wireloop

AV-Iteration

TabularQPlan

Q-Learning

E-Sarsa

Sarsa

Monte Carlo

5.8 Racetrack

paths obtained using value iteration

track 1

track 2

6.5 Windygrid

Action Value Iteration

TabularQPlan

6.6 Cliffwalk

Action Value Iteration

Q-Learning

TabularQPlan

Expected Sarsa

8.1 Dynamaze

Action Value Iteration

Prioritized sweeping

Additional Examples

Repeated Prisoner's dilemma

Exact expected reward of two adversarial optimistic agents depending on their initial configuration:

Exact expected reward of two adversarial Upper-Confidence-Bound agents depending on their initial configuration:

Integration

From time to time, a version is deployed and made available for maven integration. Specify repository and dependency of the subare library in the pom.xml file of your maven project:

<dependencies>
  <!-- other dependencies -->
  <dependency>
    <groupId>ch.alpine</groupId>
    <artifactId>subare</artifactId>
    <version>0.4.3</version>
  </dependency>
</dependencies>

<repositories>
  <!-- other repositories -->
  <repository>
    <id>subare-mvn-repo</id>
    <url>https://raw.github.com/datahaki/subare/mvn-repo/</url>
    <snapshots>
      <enabled>true</enabled>
      <updatePolicy>always</updatePolicy>
    </snapshots>
  </repository>
</repositories>

The source code is attached to every release.

The branch master always contains the latest features for Java 17, and does not correspond to the most recent deployed version generally.

Contributors

Jan Hakenberg, Christian Fluri

Publications

Learning to Operate a Fleet of Cars by Christian Fluri, Claudio Ruch, Julian Zilly, Jan Hakenberg, and Emilio Frazzoli

References

Reinforcement Learning: An Introduction by Richard S. Sutton, and Andrew G. Barto

datahaki / subare Goto Github PK

subare's Introduction

ch.alpine.subare

Algorithms

Gallery

Examples

4.1 Gridworld

4.2: Jack's car rental

4.4: Gambler's problem

5.1 Blackjack

5.2 Wireloop

5.8 Racetrack

6.5 Windygrid

6.6 Cliffwalk

8.1 Dynamaze

Additional Examples

Repeated Prisoner's dilemma

Integration

Contributors

Publications

References

subare's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org