×

Effective short-term opponent exploitation in simplified poker. (English) Zbl 1200.91015

Summary: Uncertainty in poker stems from two key sources, the shuffled deck and an adversary whose strategy is unknown. One approach to playing poker is to find a pessimistic game-theoretic solution (i.e., a Nash equilibrium), but human players have idiosyncratic weaknesses that can be exploited if some model or counter-strategy can be learned by observing their play. However, games against humans last for at most a few hundred hands, so learning must be very fast to be useful. We explore two approaches to opponent modelling in the context of Kuhn poker, a small game for which game-theoretic solutions are known. Parameter estimation and expert algorithms are both studied. Experiments demonstrate that, even in this small game, convergence to maximally exploitive solutions in a small number of hands is impractical, but that good (e.g., better than Nash) performance can be achieved in as few as 50 hands. Finally, we show that amongst a set of strategies with equal game-theoretic value, in particular the set of Nash equilibrium strategies, some are preferable because they speed learning of the opponent’s strategy by exploring it more effectively.

MSC:

91A06 \(n\)-person games, \(n>2\)
91A60 Probabilistic games; gambling

Software:

GameShrink
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th annual symposium on foundations of computer science (pp. 322–331). · Zbl 0938.68920
[2] Billings, D., Burch, N., Davidson, A., Holte, R. C., Schaeffer, J., Schauenberg, T., & Szafron, D. (2003). Approximating game-theoretic optimal strategies for full-scale poker. In Proceedings of 18th international joint conference on artificial intelligence (IJCAI-2003) (pp. 661–675).
[3] Billings, D., Davidson, A., Schauenberg, T., Burch, N., Bowling, M., Holte, R., Schaeffer, J., & Szafron, D. (2004). Game tree search with adaptation in stochastic imperfect information games. In Computers and games’04 (pp. 21–34). Berlin: Springer.
[4] Brown, G. W. (1951). Iterative solution of games by fictitious play. In T. C. Koopmans (Ed.), Activity analysis of production and allocation (pp. 374–376). New York: Wiley.
[5] Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38. · Zbl 0364.62022
[6] Freund, Y., & Schapire, R. (1996). Game theory, on-line prediction and boosting. In Proceedings of the 9th annual conference on computational learning theory (COLT-96) (pp. 325–332).
[7] Freund, Y., & Schapire, R. E. (1999). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29, 79–103. · Zbl 0964.91007 · doi:10.1006/game.1999.0738
[8] Fudenberg, D., & Levine, D. K. (1998). The theory of learning in games. Cambridge: MIT Press. · Zbl 0939.91004
[9] Gilpin, A., & Sandholm, T. (2005). Optimal Rhode Island Hold’em poker. Intelligent systems demonstration described. In Proceedings of the 20th national conference on artificial intelligence (AAAI-05) (pp. 1684–1685).
[10] Gilpin, A., & Sandholm, T. (2006). A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. In Proceedings of the 21st national conference on artificial intelligence (AAAI-06) (pp. 1007–1013).
[11] Gilpin, A., & Sandholm, T. (2007a). Better automated abstraction techniques for imperfect information games with application to Texas Hold’em poker. In Proceedings of the 6th international joint conference on autonomous agents and multiagent systems (AAMAS-07) (p. 192). · Zbl 1314.91025
[12] Gilpin, A., & Sandholm, T. (2007b). Lossless abstraction of imperfect information games. Journal of the ACM, 54(5). · Zbl 1314.91025
[13] Gilpin, A., Sandholm, T., & Sørensen, T. B. (2007). Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold’em poker. In Proceedings of the 22nd national conference on artificial intelligence (AAAI-07).
[14] Gordon, G. (2005). No-regret algorithms for structured prediction problems (Draft version).
[15] Hoehn, B. (2006). The effectiveness of opponent modelling in a small imperfect information game. Master’s thesis, University of Alberta, Dept. of Computing Science.
[16] Hoehn, B., Southey, F., Holte, R. C., & Bulitko, V. (2005). Effective short-term opponent exploitation in simplified poker. In Proceedings of the 20th national conference on artificial intelligence (AAAI-2005) (pp. 783–788). · Zbl 1200.91015
[17] Johanson, M., Zinkevich, M., & Bowling, M. (2007). Computing robust counter-strategies. In Advances in neural information processing systems 20 (NIPS 2007) (pp. 721–728).
[18] Koller, D., & Pfeffer, A. (1997). Representations and solutions for game-theoretic problems. Artificial Intelligence, 94(1), 167–215. · Zbl 0907.90297 · doi:10.1016/S0004-3702(97)00023-4
[19] Korb, K., Nicholson, A., & Jitnah, N. (1999). Bayesian poker. In Proceedings of the 15th conference on uncertainty in artificial intelligence (UAI-1999) (pp. 343–350).
[20] Kuhn, H. W. (1950). A simplified two-person poker. Contributions to the Theory of Games, 1, 97–103. · Zbl 0041.25601
[21] Margineantu, D. D., & Dietterich, T. G. (2002). Improved class probability estimates from decision tree models. In D. D. Denison, M. H. Hansen, C. C. Holmes, B. Mallick, & B. Yu (Eds.), Lecture notes in statistics : Vol. 171. Nonlinear estimation and classification (pp. 169–184). New York: Springer. · Zbl 1142.62370
[22] Michie, D. (1966). Game-playing and game-learning automata. In Advances in programming and non-numerical computation (pp. 183–196). Elmsford: Pergamon. Chap. 8.
[23] Poland, J., & Hutter, M. (2005). Universal learning of repeated matrix games (Tech. rep. IDSIA-18-05). IDSIA.
[24] Powers, R., & Shoham, Y. (2003). Computing best response strategies via sampling (Tech. rep.). Stanford.
[25] Powers, R., & Shoham, Y. (2005). New criteria and a new algorithm for learning in multi-agent systems. In L.K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (NIPS-2004) (pp. 1089–1096).
[26] Robinson, J. (1951). An iterative method of solving a game. The Annals of Mathematics, 2nd Series, 54, 296–301. · Zbl 0045.08203 · doi:10.2307/1969530
[27] Russell, S., & Norvig, P. (1995). Artificial intelligence: a modern approach (1st ed.). Upper Saddle River: Prentice Hall. · Zbl 0835.68093
[28] Shi, J., & Littman, M. (2001). Abstraction models for game theoretic poker. In Computers and games (CG-2000) (pp. 333–345). · Zbl 0989.91509
[29] Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., & Rayner, C. (2005). Bayes’ bluff: opponent modelling in poker. In Proceedings of the 21st conference on uncertainty in artificial intelligence (UAI-2005) (pp. 550–558).
[30] von Neumann, J., & Morgenstern, O. (1947). The theory of games and economic behavior (2nd ed.). Princeton: Princeton University Press. · Zbl 1241.91002
[31] Zinkevich, M. (2004). Theoretical guarantees for algorithms in multi-agent settings. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
[32] Zinkevich, M., Bowling, M., & Burch, N. (2007a). A new algorithm for generating equilibria in massive zero-sum games. In Proceedings of the 22nd national conference on artificial intelligence (AAAI 2007) (pp. 788–793).
[33] Zinkevich, M., Bowling, M., Johanson, M., & Piccione, C. (2007b). Regret minimization in games with incomplete information. In Advances in neural information processing systems 20 (NIPS 2007) (pp. 721–728).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.