Giter Site home page Giter Site logo

deepstack-leduc's Introduction

DeepStack for Leduc Hold'em

DeepStack is an artificial intelligence agent designed by a joint team from the University of Alberta, Charles University, and Czech Technical University. In a study completed in December 2016, DeepStack became the first program to beat human professionals in the game of heads-up (two player) no-limit Texas hold'em, a commonly played poker game.

This project reimplements DeepStack as a player for heads-up no-limit Leduc holdem, a much simpler game than Texas hold'em.

DeepStack is built around two components:

  • An offline component that solves random poker situations (public game states along with probability vectors over private hands for both players) and uses them to train a neural network. After training, this neural network can accurately predict the value to each player of holding each possible hand at a given poker situation.
  • An online component which uses the continuous re-solving algorithm to dynamically choose an action for DeepStack to play at each public state encountered during gameplay. This algorithm solves a depth-limited lookahead using the neural net to estimate values at terminal states.

The strategy played by DeepStack approximates a Nash Equilibrium, with an approximation error that depends on the error of the neural net and the solution error of the solver used in continuous re-solving.

Prerequisites

Running any of the DeepStack code requires Lua and torch. Torch is only officially supported for *NIX based systems (i.e. Linux and Mac OS X), and as such, we don't officially support installation of DeepStack on Windows systems. This page contains suggestions for using torch on a Windows machine; if you decide to try this, we cannot offer any help.

Connecting DeepStack to a server requires the luasocket package. This can be installed with luarocks (which is installed as part of the standard torch distribution) using the command luarocks install luasocket. Visualising the trees produced by DeepStack requires the graphviz package, which can be installed with luarocks install graphviz. Running the code on the GPU requires cutorch. Currently only version 1.0 is supported which can be installed with luarocks install cutorch 1.0-0.

The DeepStack player uses the protocol of the Annual Computer Poker Competition (a description of the protocol can be found here) to receive poker states and send poker actions as messages over a network socket connection. If you wish to play against DeepStack, you will need a server for DeepStack to connect to that acts as the dealer for the game; code for a server that fills this role is available through the ACPC here. For convenience, we include a copy of the ACPC dealer in the ACPCDealer/ directory, along with a game definition file (ACPCDealer/leduc.game) so that the dealer can play Leduc hold'em.

If you would like to personally play against DeepStack, you will need a way of interacting with the server yourself; one such option is the ACPC GUI available from Dustin Morrill here.

Leduc Hold'em

Leduc Hold'em is a toy poker game sometimes used in academic research (first introduced in Bayes' Bluff: Opponent Modeling in Poker). It is played with a deck of six cards, comprising two suits of three ranks each (often the king, queen, and jack - in our implementation, the ace, king, and queen). The game begins with each player being dealt one card privately, followed by a betting round. Then, another card is dealt faceup as a community (or board) card, and there is another betting round. Finally, the players reveal their private cards. If one player's private card is the same rank as the board card, he or she wins the game; otherwise, the player whose private card has the higher rank wins.

The game that we implement is No-Limit Leduc Hold'em, meaning that whenever a player makes a bet, he or she may wager any amount of chips up to a maximum of that player's remaining stack. There is also no limit on the number of bets and raises that can be made in each betting round.

Documentation

Documentation for the DeepStack Leduc codebase can be found here. In particular, there is a tutorial which introduces the codebase and walks you through several examples, including running DeepStack.

The documentation for code files was automatically generated with LDoc, which can be installed with luarocks install ldoc. To re-generate the docs, run ldoc . in the doc/ directory. If you wish to also generate documentation for local functions, run ldoc . -all instead.

deepstack-leduc's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepstack-leduc's Issues

Incorrect Masked Huber Loss calculation

in line 57 of masked_huber_loss.lua, it says 1 is for impossible features.

it is actually 0 for impossible features.

So line 65 should actually be (batch_size * feature_size) / self.mask_sum:sum()

Lines 58-60 should also be changed

Comparison to CMU's new Modicum agent

I saw this new paper from CMU with a different way to do depth-limited solving that apparently needs only a small fraction of the resources needed by DeepStack:

https://arxiv.org/pdf/1805.08195.pdf

Anyone else read this paper and have any thoughts? Unfortunately, as is typical for CMU they don't release any source code but it would be very interesting to see a comparison as they discuss DeepStack a lot but don't test against it.

Bad result for kuhn poker-ish game

  1. Modify game_settings.lua so there's 1 suit and 4 ranks.
  2. Modify test_lookahead.lua so board is Kh instead of Ks
  3. Modify test_lookahead.lua to use get_uniform_range for both player_range and opponent_range
  4. Replace line 22 of test_lookahead.lua with below code:
local results = resolving:resolve_first_node(current_node, player_range, opponent_range)
print(results.strategy)
print(results.root_cfvs)
  1. run test_lookahead.lua, you will see the strategy is to check with every hand which is fine (The strategy for Kh is initialized to 0.333 for every action and stays that way since it's an invalid hand). However, the counterfactual value of holding Ah is 66.66. Shouldn't the cfv be at least 100 since Ah is the nuts and the pot = 200?

I'm guessing the solver doesn't account for card removal and so it will win 100 2/3 of the time and tie 1/3 of the time, giving EV of 66.66?

Texas Hold'em

Hi,

Thank you for such an interesting research topic. Currently I am developing an alternative structure of deep stack using LSTM networks. I'd like to compare my results against both of Leduc and Texas Hold'em. I wonder if we can have access to dataset and trained model of deepstack for Texas Hold'em.

Thanks

Question about reach probability initialization (line 185 of Source/Tree/tree_values.lua)

Before I begin, let me note that this might very well be a misunderstanding on my side, since I haven't looked through the full codebase! I just want to make sure that no bug has sneaked into your or my implementation.

I implemented something that looks similar to your /Tree/ module in Python, however, my implementation doesn't use the matrix ops yours uses. To compute the cfv for p when 1-p folds, for instance, I run (equation simplified and ignoring blockers to keep focus): cfv[p] = node.pot * np.sum(node.reach_probs[1-p]) - node.reach_probs[1-p]. I believe, this might produce wrong results if reach probabilities on the root are initialized as you do in line 185 of Source/Tree/tree_values.lua. You initialize the reach probability of each holding with 1/num_holdings. Using my equation for the cfv of folding, this will cause all values on the tree to be scaled down by (num_holdings-1)/num_holdings. Taking the thought experiment of the player who acts first on the first round in Leduc, always folding; his opponent still has to have some holding, thereby reducing the sum of the first player's reach probabilities to 5/6. Do your equations correct for that somewhere? I'd very much appreciate if you could clarify this for me.

Cheers

Hands isomorphism question

Hello Martin!
Can you please say did you use hands isomorphism properties to convert hole hands to canonical form and make things faster? From the paper Figure 3 looks like not. If so what are the reasons that you didn't use it?

Understanding Lookahead:_compute_cfvs()

I understand in normal CFR how a parent node computes the CFV, but I'm struggling to understand how this is done in the lookahead's vectorized implementation.

Specifically, in Lookahead:_compute_cfvs():

function Lookahead:_compute_cfvs()
  for d=self.depth,2,-1 do
    local gp_layer_terminal_actions_count = self.terminal_actions_count[d-2]
    local ggp_layer_nonallin_bets_count = self.nonallinbets_count[d-3]

    self.cfvs_data[d][{{}, {}, {}, {1}, {}}]:cmul(self.empty_action_mask[d])
    self.cfvs_data[d][{{}, {}, {}, {2}, {}}]:cmul(self.empty_action_mask[d])

    self.placeholder_data[d]:copy(self.cfvs_data[d])

    --player indexing is swapped for cfvs
    self.placeholder_data[d][{{}, {}, {}, self.acting_player[d], {}}]:cmul(self.current_strategy_data[d])

    torch.sum(self.regrets_sum[d], self.placeholder_data[d], 1)

    --use a swap placeholder to change {{1,2,3}, {4,5,6}} into {{1,2}, {3,4}, {5,6}}
    local swap = self.swap_data[d-1]
    swap:copy(self.regrets_sum[d])
    self.cfvs_data[d-1][{{gp_layer_terminal_actions_count+1, -1}, {1, ggp_layer_nonallin_bets_count}, {}, {}, {}}]:copy(swap:transpose(2,3))
  end

end

So I understand we multiply empty (illegal) actions by the mask so their CFV is zero, but I'm lost about what the swap and transpose is doing, or how slicing the parent's cfvs_data copies things to the right place.

Can anyone explain more clearly?

Malfunctioning `test_tree_visualiser.lua`

Hi @lifrordi !

When my settings Settings/arguments.lua is:

--- list of pot-scaled bet sizes to use in tree
-- @field params.bet_sizing
params.bet_sizing = {1}

and visualizing by running command:

$ th Tree/Tests/test_tree_visualiser.lua

I get the following visualization:
simple_tree_1bet

However, when I modify Settings/arguments.lua as follows:

params.bet_sizing = {1, 2}

and when I run the very same command, I get this visualization:
simple_tree_12bet
This is obviously incorrect, since we should have more bet actions, namely "2-pot bet". Am I missing something, or are the two visualizations supposed to be identical?

Thanks for clarification!

Idea to improve training

Right now DeepStack is using masked huber loss to compute the loss where the bucket is given weight 0 if impossible and 1 if possible.
What if we changed the mask so it can be any value between 0 and 1 weighted by how likely that bucket is?

So if there are 2 buckets A and B that both have error of 0.5, but bucket A has range probability 0.01, and bucket B has probability 0.0001, it would give 100x more importance to updating bucket A's CFV to become closer to its target.

Converts hand range to bucket

@lifrordi hello, I was still confused about convert hand range to bucket
My question 1 is: According to your reply. Deepstack arrange buckets by the player's expected hand strength during training and during the continuous re-solving process. As we know normal EHS is based on the uniform of opponent private cards. But both in training and continuous re-solving process we can get opponent range(private cards distribution). So Deepstack use normal EHS(uniform) or use the opponent current range to real time compute EHS. If is later one, in my test it need more than 1 minute to compute EHS in real time. How do you speed up that Deepstack run fast within 10s?
Question 2 is: normal card abstraction is do offline and store the result in file, and load the card abstraction file when playing and run cfr. So Deepstack do buckets offline or online ? offline compute Histogram and use EMD and run k-means to produce result?

(Texas Hold'em)TerminalEquity Computation, real-time or precompute table-lookup

@lifrordi Hello!
I have tried to construct call_matrix of flop using real-time computaion, it cost me about 90 second, so preflop would cost much more than 90s.
I have tried to precompute it, but to store 1000 matrix[1326*1326] requires about 40GB, it's too large to save.
I wonder how did you implement TerminalEquity in Texas Hold'em, Did you precompute it and do table lookup?
Thanks!

Alternative NN architecture gave curious(questionable?), results

Hello, Martin
Want to share one observation and ask a question. I have built an alternative C++ implementation basing on your ideas with some optimizations. Among it, I am using other NN architecture implementation and get an interesting result in one case. I have used one million trained data points generated with CFR for 1,000 iterations as you. But got in the end exploitability 0.4 or something like that. That is less than ~1. exploitability for the CFR 1000(one training sample). How do you think is this possible or this is a bug in my exploitability calculation( this is possible)? In theory, how do you think can a lot of samples reduce individual sample exploitability variance? I have tested this version VS this repository version and looks like it wins on average.


Also, I have a question, why are you estimating CF values with NN at the start of the new round for every next board, why not at the end of the current round? Because of accuracy considerations or because of the fact that we need to generate much more training cases for this? This should give us big speed improvement after. Did you test this setup? Thanks.

lookahead.ranges_data tensor size

I was also confuse about Code comments said it has six dimensional tensor, but lookahead.ranges_data[1] just an five dimensional tensor .look like lose batch dimensional

line 130

--data structures [actions x parent_action x grandparent_id x batch x players x range]
self.lookahead.ranges_data[1] = arguments.Tensor(1, 1, 1, constants.players_count, game_settings.card_count):fill(1.0 / game_settings.card_count)

Bug in utility(equity) calculation logic.

For example, game came to the last street with As(index = 0) on board and P2 card range { 0, 0.05, 0.07, 0.2, 0.15, 0.43}. For example, we are calculating equity for the case when P1 get Ah(index =1). In the implimentation, we are zeroing out Ah range(0.05) from P2 and calculating dot product (0.07* 1+ 0.2 * 1+ 0.15 * 1+ 0.43 * 1) * pot_size = 0.85 * pot_size.
But it looks like that when we are zeroing out Ah range we should boost probabilities of all other cards as we are considering the case that P1 get Ah, so P2 can't have it and we are zeroed it, shouldn't we?

The other question why are we considering utility as +-pot_size and not +- pot_size/2 as we will win/lose only our/op half of pot.

Can you clarify, please?

Terminal equity weight incorrect

In line 91 of terminal_equity.lua in the board_card_count == 1 case, it should be

1/(game_settings.card_count - 1)

rather than

1/(game_settings.card_count - 2)

since there are 5 valid boards for each hand, the denominator should be 5.

I tested with Tree/test_tree_cfr.lua and the modification produces a lower exploitability.

sorted range generation, and dynamic bucketing

@lifrordi I am implementing Deepstack on two player no-limit texas holdem. Since the deepstack-leduc is a simple demonstration, it simplfied the bucketing process and sorted_range generation process. These are two major aspect that I feel confused about. In the paper, there are little details about these two details.
My question is:
Q1: in order to make ranges(randomly generated) looks more like real game situation distribution, before we assign probability(generated by the code showed in the paper) to those possible hands, we need to sort possible 2 card hole(1326 type at most) according to board cards(use hand strength), i suppose that EHS are used in the sorting process. Is my thought reasonable? What kind of measure did you use in sorting process?
Q2: after random poker situations are generated, we use public cfr+ to solve the random poker situation. After that we need to convert card range to bucket range(can be regard as card abstraction), what is the measurement of abstraction, Hand Strenght weighted by opponent range? or EHS(assume opponent hole uniform random)?

Unused variable or bug in tree_builder._get_children_nodes

--is this a transition call node (leading to a chance node)? local call_is_transit = parent_node.current_player == constants.players.P2 and parent_node.bets[1] == parent_node.bets[2] and parent_node.street < constants.streets_count

but call_is_transit variable is not used in the code below. is that a bug? Also, I have found some more code issues should I report it here?

How to reduce the GPU memory used when compute preflop strategy in last 20 iterations?

@lifrordi hi, In the Deepstack paper said that compute preflop strategy in last 20 iterations was query flop network, which means that input tensor to network size is [22100* 10, 2001] (10 nodes), if the network was 7 layers as the paper said, the need memory is large then 8GB, and they speed is slower than paper said. Do you use suit isomorphism or some other technique to reduce the public cards from 22100 to same value?

How to speed up solving from the turn card is dealt

Hi @lifrordi
Thank you for releasing the code of DeepStack-Leduc.

In your paper, you mentioned that you generated ten million situations for the turn network in 175 core years, which means 551 core seconds for each situation.

I tried this but my code is much slower and it needs thousands of core seconds to generate one situation. There is too much matrix calculation.

So, would you please share with me how to optimize this process? Did you use some abstraction?

Thank you very much!

a

a

Implementing Modicum?

Have you guys read the latest modicum bot from CMU?

Also, Libratus beet Deepstack by a large margin, is there any interest in implementing Modicum into deepstack??

Range Generation method can produce impossible range pairs

Since the range generation is done for each player independently, there can be impossible range pairs produced such as:

P1:

  • A: 70%
  • K: 10%
  • Q: 20%

P2:

  • A: 50%
  • K: 25%
  • Q: 25%

It's not possible for P1 to have A 70% of the time and P2 also having A 50% of the time.

The impossible ranges could skew the models produced

random poker situation solving

Hi, i am a student from HIT focus on imperfect information game. I noticed that training data are generated by solving random poker situation, and in this project, the solving algorithm is re-sloving, but re-solving itself needs support from neural network which are trained with training data. I am confused. Could I understand this way: you use one sloving algorithm to solve random poker situation to generate data, then train value network; when it comes to play texas hold'em, you use another solving method (re-solving) to do real time computation based on values given by value network ? Or you use re-solving to solve RIVER round random poker situation, then construct RIVER round value network, then use re-solving to solve TURN round random poker situation based on previously built value network, then construct TURN value network, then flop...then preflop?
I am looking forward to your reply.

compute the number of terminal node in level 2

In lookaheadbuilder.lua (https://github.com/lifrordi/DeepStack-Leduc/blob/master/Source/Lookahead/lookahead_builder.lua) line 87

self.lookahead.terminal_nodes_count[2] = 2

why self.lookahead.terminal_nodes_count[2] = 2 it should be 1 because just fold action lead to terminal node.

and line 80

self.lookahead.nonterminal_nodes_count[2] = self.lookahead.bets_count[1]

if the action abstraction is FCPA. then self.lookahead.bets_count[1] is equal to 3 (C,P,A). According to _compute_tree_structures function.

so self.lookahead.nonterminal_nodes_count[2] = 3 and self.lookahead.nonterminal_nodes_count[2] + self.lookahead.terminal_nodes_count[2] = 5 > 4 (FCPA) which is confused with me.

Special treatment of all-in actions in lookahead?

I'm trying to understand the lookahead code in order to apply it to a non-poker game, and I was hoping you could clarify why all-in actions are treated specially in the code compared to regular bets?

For example there are variables like nonallinbets_count, allin_nodes_count, but how would those be used in the general case? Is it nodes that have children but no grandchildren?

A question about masked_huber_loss

Hi @lifrordi
line 57 in masked_huber_loss.lua says that mask is 1 for impossible features, but I think mask should be 0 for impossible features, according to the code in bucker.lua.

This line
local loss_multiplier = (batch_size * feature_size) / (batch_size * feature_size - self.mask_sum:sum() ) only makes sense when mask_sum is the sum of the count of impossible features (which are most of the features for one specific sample), right?
but when I run this code, I found that self.mask_sum is the sum of the count of valid buckets.

card_count = 6
board_card_count = 1
bucket_count = 36
for one specific case, valid_bucket_count = 5
And my question is, should self.mask_sum be 5 or 31?
Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.