Giter Site home page Giter Site logo

cs229-pool's Introduction

CS229 Pool Game AI

Usage

# (Optional) Create & activate virtual environment
$ virtualenv .venv
$ source .venv/bin/activate

# Install package
$ python -m pip install -r requirements.txt

# Run game
$ python -m src.game.main

# Run training, ALGO = q-table | dqn | a3c | a3c-discrete
$ python -m src.model.train [--balls BALLS] [--algo ALGO] [--visualize] output_model

# Run evaluation, ALGO = random | q-table | dqn | a3c | a3c-discrete
$ python -m src.model.eval [--model MODEL] [--balls BALLS] [--algo ALGO] [--visualize]

Tools

  • Visualize average rewards over episodes
    • $ python -m src.utils.training_rewards_vis INPUT_FILE OUTPUT_FILE

Problem Formulation

  • States
    • [(x, y)]: list of (x, y) coordinates of the balls; white ball first
      • Coordinate
        • Continuous range: [0, 1000]
        • Discrete range (Q-table only): [0, 19]
  • Actions
    • Angle
      • Continuous range: [0, 1]
      • Discrete range (Q-table only): [0, 17]
    • Force
      • Continuous range: [0, 1]
      • Discrete range (Q-table only): [0, 4]
  • Rewards
    • If pocket the balls, reward += 5 for each pocketed ball
    • If no ball is hit, reward += -1

References

cs229-pool's People

Contributors

nkatz565 avatar nlandy avatar pyliaorachel avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cs229-pool's Issues

Reset state

Lines:

CS229-pool/src/model/env.py

Lines 137 to 144 in 1f88211

self.game = gamestate.GameState()
self.game.start_pool()
self.events = event.events()
self.game.redraw_all()
# TODO: init to initial ball positions in pool game
self.current_obs = [(0, 0)] * self.num_balls
self.current_state = self.state_space.get_state(self.current_obs)

  • obs: observation
    • The raw state on game table.
  • state: state in reinforcement learning
    • Related to observation but represented in a different way to fit the purpose of the model
  • E.g. the observation is now a list of ball positions from (0, 0) to (1000, 1000), where the first ball in the list represents the white ball. But in q_table, the state is the observation discretized into 50 buckets, i.e. (0, 0) to (49, 49) , where (49, 49) in state is the same as (980, 980) ~ (1000, 1000) in observation

You can refer to OpenAI environment code for reference; most of the PoolEnv logic conforms to them.

So now in lines 137-140, make sure you pass in some parameters like number of balls (which is self.num_balls) or anything relevant to the game (should also modify the game to take these as parameters).

In lines 142-144, I have put down the list of ball positions to all (0, 0). You need to replace these with the real position of the balls when the game starts.

Put game logic back to game

Lines:

CS229-pool/src/model/env.py

Lines 150 to 158 in 1f88211

self.game.cue.update_cue_displacement(100)
self.game.cue.update_cue(self.game, 0, self.events, 1)
self.game.cue.ball_hit()
while not self.game.all_not_moving():
self.events = event.events()
collisions.resolve_all_collisions(self.game.balls, self.game.holes, self.game.table_sides)
self.game.redraw_all()
self.game.return_game_state()

Create a function in the game class for the model to call like this:
game.step(angle, force)
instead of exposing the game logic in the environment.

The angle and force comes from real_action[0] and real_action[1]. The two numbers are both between [0, 1], and you should either transform it back to the real range [0, 2pi] before passing them into game.step, or deal with the translation in game.step. Just be consistent.

Action step

Lines:

CS229-pool/src/model/env.py

Lines 160 to 163 in 1f88211

# TODO: now it's random update, change to real update on pool table
self.current_state = self.state_space.sample()
reward = np.random.choice(10) - 5 # [-5, 5]
return self.current_state, reward, False

After you resolved #1 , you should do something like this:

ball_pos, holes_in, done = game.step(angle, force)

where ball_pos is the final ball position after doing this action, holes_in is the number of balls that get into the holes in this hit (you can also return other information like balls_hit to penalize the hits not hitting any ball, etc.), and done is whether the game is finished or not, i.e. all balls are in the hole.

Then you 1. transform ball_pos back to the observation and state representation in the env, and 2. design some reward functions based on holes_in, balls_hit, etc.

Btw, the 3 returned components are next state, reward, done.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.