wenkesj / holdem Goto Github PK

:black_joker: OpenAI Gym No Limit Texas Hold 'em Environment for Reinforcement Learning

Python 100.00%

reinforcement-learning texas-holdem-poker openai-gym

holdem's Introduction

holdem

⚠️ This is an experimental API, it will most definitely contain bugs, but that's why you are here!

pip install holdem

Afaik, this is the first OpenAI Gym No-Limit Texas Hold'em* (NLTH) environment written in Python. It's an experiment to build a Gym environment that is synchronous and can support any number of players but also appeal to the general public that wants to learn how to "solve" NLTH.

*Python 3 supports arbitrary length integers 💸

Right now, this is a work in progress, but I believe the API is mature enough for some preliminary experiments. Join me in making some interesting progress on multi-agent Gym environments.

Usage

There is limited documentation at the moment. I'll try to make this less painful to understand.

`env = holdem.TexasHoldemEnv(n_seats, max_limit=1e9, debug=False)`

Creates a gym environment representation a NLTH Table from the parameters:

n_seats - number of available players for the current table. No players are initially allocated to the table. You must call env.add_player(seat_id, ...) to populate the table.
max_limit - max_limit is used to define the gym.spaces API for the class. It does not actually determine any NLTH limits; in support of gym.spaces.Discrete.
debug - add debug statements to play, will probably be removed in the future.

`env.add_player(seat_id, stack=2000)`

Adds a player to the table according to the specified seat (seat_id) and the initial amount of chips allocated to the player's stack. If the table does not have enough seats according to the n_seats used by the constructor, a gym.error.Error will be raised.

`(player_states, community_states) = env.reset()`

Calling env.reset resets the NLTH table to a new hand state. It does not reset any of the players stacks, or, reset any of the blinds. New behavior is reserved for a special, future portion of the API that is yet another feature that is not standard in Gym environments and is a work in progress.

The observation returned is a tuple of the following by index:

player_states - a tuple where each entry is tuple(player_info, player_hand), this feature can be used to gather all states and hands by (player_infos, player_hands) = zip(*player_states).
- player_infos - is a list of int features describing the individual player. It contains the following by index: 0. [0, 1] - 0 - seat is empty, 1 - seat is not empty.
  1. [0, n_seats - 1] - player's id, where they are sitting.
  2. [0, inf] - player's current stack.
  3. [0, 1] - player is playing the current hand.
  4. [0, inf] the player's current handrank according to treys.Evaluator.evaluate(hand, community).
  5. [0, 1] - 0 - player has not played this round, 1 - player has played this round.
  6. [0, 1] - 0 - player is currently not betting, 1 - player is betting.
  7. [0, 1] - 0 - player is currently not all-in, 1 - player is all-in.
  8. [0, inf] - player's last sidepot.
- player_hands - is a list of int features describing the cards in the player's pocket. The values are encoded based on the treys.Card integer representation.
community_states - a tuple(community_infos, community_cards) where:
- community_infos - a list by index: 0. [0, n_seats - 1] - location of the dealer button, where big blind is posted.
  1. [0, inf] - the current small blind amount.
  2. [0, inf] - the current big blind amount.
  3. [0, inf] - the current total amount in the community pot.
  4. [0, inf] - the last posted raise amount.
  5. [0, inf] - minimum required raise amount, if above 0.
  6. [0, inf] - the amount required to call.
  7. [0, n_seats - 1] - the current player required to take an action.
- community_cards - is a list of int features describing the cards in the community. The values are encoded based on the treys.Card integer representation. There are 5 int in the list, where -1 represents that there is no card present.

Example

import gym
import holdem

def play_out_hand(env, n_seats):
  # reset environment, gather relevant observations
  (player_states, (community_infos, community_cards)) = env.reset()
  (player_infos, player_hands) = zip(*player_states)

  # display the table, cards and all
  env.render(mode='human')

  terminal = False
  while not terminal:
    # play safe actions, check when noone else has raised, call when raised.
    actions = holdem.safe_actions(community_infos, n_seats=n_seats)
    (player_states, (community_infos, community_cards)), rews, terminal, info = env.step(actions)
    env.render(mode='human')

env = gym.make('TexasHoldem-v1') # holdem.TexasHoldemEnv(2)

# start with 2 players
env.add_player(0, stack=2000) # add a player to seat 0 with 2000 "chips"
env.add_player(1, stack=2000) # add another player to seat 1 with 2000 "chips"
# play out a hand
play_out_hand(env, env.n_seats)

# add one more player
env.add_player(2, stack=2000) # add another player to seat 1 with 2000 "chips"
# play out another hand
play_out_hand(env, env.n_seats)

holdem's People

Contributors

Stargazers

Watchers

Forkers

traderbagel moshiii coolhighjumper hisarack chungfu27 liuhaoping chichie nanite-git objectbuild ethanww smeilz bluepine bdach tarsbase mzktbyjc2016 vinqbator guyko81 rk0n noname72 mannlig thisispoker trickydickie hargonix shaohelv wengminhua thinhlx1993 downseq ajascha maxisbest sharetech chenhuayi83 zhang1993 ravigupta2323 waldow90 masterbc jacoblufc campaneros jinlmsft denghao97 jhu3373

holdem's Issues

Key error in treys.evaluator

Getting this after thousands of iterations with RL algo. Apparently our hand + community cards add up to 8 cards for some reason and evaluator can't rank this hand.

/usr/local/lib/python3.6/dist-packages/holdem-1.0.0-py3.6.egg/holdem/env.py in _resolve_round(self, players)
    383       # compute hand ranks
    384       for player in players:
--> 385         player.handrank = self._evaluator.evaluate(player.hand, self.community)
    386 
    387       # trim side_pots to only include the non-empty side pots

/usr/local/lib/python3.6/dist-packages/treys-0.1.3-py3.6.egg/treys/evaluator.py in evaluate(self, cards, board)
     33         """
     34         all_cards = cards + board
---> 35         return self.hand_size_map[len(all_cards)](all_cards)
     36 
     37     def _five(self, cards):

KeyError: 8

gym.error.UnregisteredEnv: No registered env with id: TexasHoldem-v1

Hi there,

It's a great tool to work with. I was trying to give it a shot, but got an error.

File "/Users/me/Research/Project/dnn-framework/project/ai-contest/holdem/holdem.py", line 21, in
env = gym.make('TexasHoldem-v1') # holdem.TexasHoldemEnv(2)
File "/Users/me/anaconda/envs/tf1.8/lib/python3.6/site-packages/gym/envs/registration.py", line 167, in make
return registry.make(id)
File "/Users/me/anaconda/envs/tf1.8/lib/python3.6/site-packages/gym/envs/registration.py", line 118, in make
spec = self.spec(id)
File "/Users/me/anaconda/envs/tf1.8/lib/python3.6/site-packages/gym/envs/registration.py", line 153, in spect
raise error.UnregisteredEnv('No registered env with id: {}'.format(id))

gym.error.UnregisteredEnv: No registered env with id: TexasHoldem-v1

I just tried the example code in the github. I'm using python 3.6, tensorflow 1.8 and gym 0.10.5. Any clue?

Much appreciated.

openai/gym commit #836 change spec

according to commit message

Changed MultiDiscrete action space to range from [0, ..., n-1] rather than [a, ..., b-1].

this cause the current code can't run.

Pip installation does not work

ingvar@ingvar-ubuntu:~$ pip3 search holdem
holdem (1.0.0)        - OpenAI Gym No-Limit Texas Holdem Environment.
SONNYGAMES (1.3)      - ROYALE HOLDEM  POKER LIVE
dante-pot-odds (0.1)  - Library to calculate pot odds in holdem
`ngvar@ingvar-ubuntu:~$ sudo pip3 install holdem
Collecting holdem
  Could not find a version that satisfies the requirement holdem (from versions: )
No matching distribution found for holdem
ingvar@ingvar-ubuntu:~$

That's what I'm getting

player.py line 126

I'm getting an error on line 126 of player.py
elif choice == Player.FOLD:

Should it be"action_idx" instead of "choice"?
elif action_idx== Player.FOLD:

Minraise calculation is wrong

When two players in heads-up minraising eachother, a sequence of total betsizes with 10/25 blinds: 10-25-60-110-etc come up. Should be 10-25-50-75-100-etc.

In env.py line 207

      elif move[0] == 'raise':
        self._player_bet(self._current_player, move[1] + self._current_player.currentbet)

Should be

      elif move[0] == 'raise':
        self._player_bet(self._current_player, move[1])

You are calculating the minraise as max(self._bigblind, self._lastraise + self._tocall) which yields us the total size of the bet with all previous bet sizes accounted for. No need to add our current bet.

terminal if self._round == 4 or len(players) <= 1 env.py line 230

Currently check if all round are completed:

    if self._round == 4:
      terminal = True

But should also check if players have folded and therefore is also terminal:

    if self._round == 4 or len(players) <= 1:
      terminal = True

Environment is requesting player moves when there's only one player who is not all-in

I found a problem if all but one players are all-in, i.e. only one player has stack available. The environment continues to ask the player that isn't all-in for a move, which can then mean the agent can (needlessly) continue to raise against players who obviously wouldn't be able to call (so it creates a sidepot with only that player in), as you can see here:

    players:
    0 [  ],[  ] stack: 0
    1 [2�[31m♥�[0m],[8�[31m♦�[0m] stack: 0
    2 [J�[31m♦�[0m],[7�[31m♥�[0m] stack: 28665
    3 [  ],[  ] stack: 0
    Getting move from agent for player 1 (Agent: 2)
    Player 1 Move: [0,0]
    total pot: 11335
    last action by player 1:
    �[37m_ check�[0m
    community:
    - [9♣],[4�[31m♦�[0m],[A♣],[J�[31m♥�[0m],[  ]
    players:
    0 [  ],[  ] stack: 0
    1 [2�[31m♥�[0m],[8�[31m♦�[0m] stack: 0
    2 [J�[31m♦�[0m],[7�[31m♥�[0m] stack: 28665
    3 [  ],[  ] stack: 0
    Getting move from agent for player 2 (Agent: 3)
    Minimum raise: 25  As percentage is : 0.0008721437292865864
    Player 2 Move: [2,2867]
    Player 2 ('raise', 2867)
    total pot: 14202

Here you can see player1 checked because it's all-in, but the environment continued to ask player2 for a move, who then raised against a player who was already all-in, which obviously doesn't make sense. In fact, in this case it then repeated this again when the last community card was dealt.

In env.py there's this line:

if not self._current_player.playedthisround and len([p for p in players if not p.isallin]) >= 1:

Initially I though the >= should be changed to > 1 but there'd be a problem if player1 had raised and gone all-in, and then due to the above code change it wouldn't have asked Player2 for a move as there's now only one player that's not all-in.

I could of course program my agent to check if all other players are all-in and if so then disable the ability to raise, but I think the environment should handle this better. I'm just not sure how...

Latest code change causes rounds to not resolve

In the latest code commit in env.step() you changed:

    else:
      self._resolve(players)

to this:

    if all([player.playedthisround for player in players]):
      self._resolve(players)

But rounds are getting stuck, constantly asking for the move from the same player. Maybe playedthisround isn't being set somewhere, or is being unset, so it never enters _resolve(). I've reverted to the previous line.

One action for one player

Hi,

How I can apply one action for only one player?
Then apply another action for next player?

Thank you

Add player.currentbet to _get_current_state

I think it's useful information to have the player.currentbet available on player_infos:

      player_features = [
        int(player.emptyplayer),
        int(player.get_seat()),
        int(player.stack),
        int(player.playing_hand),
        int(player.handrank),
        int(player.playedthisround),
        int(player.betting),
        int(player.isallin),
        int(player.lastsidepot),
        int(player.currentbet),
      ]

reset_stack() method and self.betting appear to be redundant in player.py?

reset_stack() method doesn't appear to be referenced

self.betting defined in player.py appear to be redundant (it's only ever set to False), but care needs to be taken if removed as it's used in player_features[], so removing it will change the indexes of the list, which may be used elsewhere.