Giter Site home page Giter Site logo

subare's Introduction

alpine_877

ch.alpine.subare

Library for reinforcement learning in Java 17.

Repository includes algorithms, examples, and exercises from the 2nd edition of Reinforcement Learning: An Introduction by Richard S. Sutton, and Andrew G. Barto.

Our implementation is inspired by the python code by Shangtong Zhang, but differs from the reference in two aspects:

  • the algorithms are implemented separate from the problem scenarios
  • the math is in exact precision which reproduces symmetries in the results in case the problem features symmetries

Algorithms

  • Iterative Policy Evaluation (parallel, in 4.1, p.59)
  • Value Iteration to determine V*(s) (parallel, in 4.4, p.65)
  • Action-Value Iteration to determine Q*(s,a) (parallel)
  • First Visit Policy Evaluation (in 5.1, p.74)
  • Monte Carlo Exploring Starts (in 5.3, p.79)
  • Contant-alpha Monte Carlo
  • Tabular Temporal Difference (in 6.1, p.96)
  • Sarsa: An on-policy TD control algorithm (in 6.4, p.104)
  • Q-learning: An off-policy TD control algorithm (in 6.5, p.105)
  • Expected Sarsa (in 6.6, p.107)
  • Double Sarsa, Double Expected Sarsa, Double Q-Learning (in 6.7, p.109)
  • n-step Temporal Difference for estimating V(s) (in 7.1, p.115)
  • n-step Sarsa, n-step Expected Sarsa, n-step Q-Learning (in 7.2, p.118)
  • Random-sample one-step tabular Q-planning (parallel, in 8.1, p.131)
  • Tabular Dyna-Q (in 8.2, p.133)
  • Prioritized Sweeping (in 8.4, p.137)
  • Semi-gradient Tabular Temporal Difference (in 9.3, p.164)
  • True Online Sarsa (in 12.8, p.309)

Gallery

prisonersdilemma

Prisoner's Dilemma

gambler_exact

Exact Gambler

Examples

4.1 Gridworld

AV-Iteration q(s,a)

gridworld_qsa_avi

TabularQPlan

gridworld_qsa_rstqp

Monte Carlo

gridworld_qsa_mces

Q-Learning

gridworld_qsa_qlearning

Expected-Sarsa

gridworld_qsa_expected

Sarsa

gridworld_qsa_original

3-step Q-Learning

gridworld_qsa_qlearning3

3-step E-Sarsa

gridworld_qsa_expected3

3-step Sarsa

gridworld_qsa_original3

OTrue Online Sarsa

gridworld_tos_original

ETrue Online Sarsa

gridworld_tos_expected

QTrue Online Sarsa

gridworld_tos_qlearning

4.2: Jack's car rental

Value Iteration v(s)

carrental_vi_true

4.4: Gambler's problem

Value Iteration v(s)

gambler_sv

Action Value Iteration and optimal policy

gambler_avi

Monte Carlo q(s,a)

gambler_qsa_mces

ESarsa q(s,a)

gambler_qsa_esarsa

QLearning q(s,a)

gambler_qsa_qlearn

5.1 Blackjack

Monte Carlo Exploring Starts

blackjack_mces

5.2 Wireloop

AV-Iteration

wire5_avi

TabularQPlan

wire5_qsa_rstqp

Q-Learning

wire5_qsa_qlearning

E-Sarsa

wire5_qsa_expected

Sarsa

wire5_qsa_original

Monte Carlo

wire5_mces

5.8 Racetrack

paths obtained using value iteration

track 1

track1

track 2

track2

6.5 Windygrid

Action Value Iteration

windygrid_qsa_avi

TabularQPlan

windygrid_qsa_rstqp

6.6 Cliffwalk

Action Value Iteration

cliffwalk_qsa_avi

Q-Learning

cliffwalk_qsa_qlearning

TabularQPlan

cliffwalk_qsa_rstqp

Expected Sarsa

cliffwalk_qsa_expected

8.1 Dynamaze

Action Value Iteration

maze5_qsa_avi

Prioritized sweeping

maze2_ps_qlearning


Additional Examples

Repeated Prisoner's dilemma

Exact expected reward of two adversarial optimistic agents depending on their initial configuration:

opts

Exact expected reward of two adversarial Upper-Confidence-Bound agents depending on their initial configuration:

ucbs

Integration

From time to time, a version is deployed and made available for maven integration. Specify repository and dependency of the subare library in the pom.xml file of your maven project:

<dependencies>
  <!-- other dependencies -->
  <dependency>
    <groupId>ch.alpine</groupId>
    <artifactId>subare</artifactId>
    <version>0.4.3</version>
  </dependency>
</dependencies>

<repositories>
  <!-- other repositories -->
  <repository>
    <id>subare-mvn-repo</id>
    <url>https://raw.github.com/datahaki/subare/mvn-repo/</url>
    <snapshots>
      <enabled>true</enabled>
      <updatePolicy>always</updatePolicy>
    </snapshots>
  </repository>
</repositories>

The source code is attached to every release.

The branch master always contains the latest features for Java 17, and does not correspond to the most recent deployed version generally.

Contributors

Jan Hakenberg, Christian Fluri

Publications

References

subare's People

Contributors

datahaki avatar jzilly avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.