×

zbMATH — the first resource for mathematics

A Bayesian optimization approach to find Nash equilibria. (English) Zbl 1410.91030
Summary: Game theory finds nowadays a broad range of applications in engineering and machine learning. However, in a derivative-free, expensive black-box context, very few algorithmic solutions are available to find game equilibria. Here, we propose a novel Gaussian-process based approach for solving games in this context. We follow a classical Bayesian optimization framework, with sequential sampling decisions based on acquisition functions. Two strategies are proposed, based either on the probability of achieving equilibrium or on the stepwise uncertainty reduction paradigm. Practical and numerical aspects are discussed in order to enhance the scalability and reduce computation time. Our approach is evaluated on several synthetic game problems with varying number of players and decision space dimensions. We show that equilibria can be found reliably for a fraction of the cost (in terms of black-box evaluations) compared to classical, derivative-based algorithms. The method is available in the \(\mathsf{R}\) package GPGame available on CRAN at https://cran.r-project.org/package=GPGame.

MSC:
91A10 Noncooperative games
91A23 Differential games (aspects of game theory)
91-04 Software, source code, etc. for problems pertaining to game theory, economics, and finance
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Adams, R.A., Fournier, J.J.: Sobolev Spaces, vol. 140. Academic Press, Cambridge (2003) · Zbl 1098.46001
[2] Álvarez, MA; Rosasco, L.; Lawrence, ND, Kernels for vector-valued functions: a review, Found. Trend Mach. Learn., 4, 195-266, (2011) · Zbl 1301.68212
[3] Azzalini, A., Genz, A.: The R package mnormt: the multivariate normal and \(t\) distributions (version 1.5-4). http://azzalini.stat.unipd.it/SW/Pkg-mnormt (2016). Accessed 8 Mar 2016
[4] Başar, T., Relaxation techniques and asynchronous algorithms for on-line computation of noncooperative equilibria, J. Econ. Dyn. Control., 11, 531-549, (1987) · Zbl 0646.90100
[5] Bect, J.; Ginsbourger, D.; Li, L.; Picheny, V.; Vazquez, E., Sequential design of computer experiments for the estimation of a probability of failure, Stat. Comput., 22, 773-793, (2012) · Zbl 1252.62081
[6] Bect, J., Bachoc, F., Ginsbourger, D.: A supermartingale approach to Gaussian process based sequential design of experiments. arXiv preprint arXiv:1608.01118 (2016)
[7] Brown, N., Ganzfried, S., Sandholm, T.: Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit Texas hold’em agent. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp. 7-15 (2015)
[8] Chevalier, C., Ginsbourger, D.: Fast computation of the multi-points expected improvement with applications in batch selection. In: Learning and Intelligent Optimization, Springer, pp. 59-69 (2013)
[9] Chevalier, C.; Emery, X.; Ginsbourger, D., Fast update of conditional simulation ensembles, Math. Geosci., 47, 771-789, (2015) · Zbl 1323.86020
[10] Cressie, N., Statistics for spatial data, Terra Nova, 4, 613-617, (1992)
[11] Dorsch, D.; Jongen, HT; Shikhman, V., On structure and computation of generalized nash equilibria, SIAM J. Optim., 23, 452-474, (2013) · Zbl 1266.49050
[12] Facchinei, F.; Kanzow, C., Generalized nash equilibrium problems, Annal. Oper. Res., 175, 177-211, (2010) · Zbl 1185.91016
[13] Fleuret, F., Geman, D.: Graded learning for object detection. In: Proceedings of the Workshop on Statistical and Computational Theories of Vision of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR/SCTV), vol. 2 (1999) · Zbl 1225.68259
[14] Friedman, A., Stochastic differential games, J. Differ. Equ., 11, 79-108, (1972) · Zbl 0208.39402
[15] Games, ILSC, Lenient learning in independent-learner stochastic cooperative games, J. Mach. Learn. Res., 17, 1-42, (2016)
[16] Garivier, A., Kaufmann, E., Koolen, W. M.: Maximin action identification: a new bandit framework for games. In: 29th Annual Conference on Learning Theory, pp. 1028-1050 (2016)
[17] Genz, A., Bretz, F.: Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics. Springer, Heidelberg (2009) · Zbl 1204.62088
[18] Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: Multivariate normal and t Distributions. http://CRAN.R-project.org/package=mvtnorm, r package version 1.0-5 (2016). Accessed 2 Feb 2016
[19] Gibbons, R.: Game Theory for Applied Economists. Princeton University Press, Princeton (1992)
[20] Ginsbourger, D., Le Riche, R.: Towards Gaussian process-based optimization with finite time horizon. In: mODa9-Advances in Model-Oriented Design and Analysis, Springer, pp. 89-96 (2010)
[21] Gonzalez, J., Osborne, M., Lawrence, N.: Glasses: relieving the myopia of Bayesian optimisation. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 790-799 (2016)
[22] Gramacy, RB; Apley, DW, Local gaussian process approximation for large computer experiments, J. Comput. Graph. Stat., 24, 561-578, (2015)
[23] Gramacy, RB; Ludkovski, M., Sequential design for optimal stopping problems, SIAM J. Financ. Math., 6, 748-775, (2015) · Zbl 1320.91154
[24] Habbal, A.; Kallel, M., Neumann-Dirichlet Nash strategies for the solution of elliptic Cauchy problems, SIAM J. Control Optim., 51, 4066-4083, (2013) · Zbl 1280.49003
[25] Habbal, A.; Petersson, J.; Thellner, M., Multidisciplinary topology optimization solved as a Nash game, Int. J. Numer. Methods Eng., 61, 949-963, (2004) · Zbl 1075.74606
[26] Harsanyi, JC, Games with randomly disturbed payoffs: a new rationale for mixed-strategy equilibrium points, Int. J. Game Theory, 2, 1-23, (1973) · Zbl 0255.90084
[27] Heaton, M. J., Datta, A., Finley, A., Furrer, R., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., Lindgren, F., et al.: A case study competition among methods for analyzing large spatial data. arXiv preprint arXiv:1710.05013 (2017)
[28] Hecht, F., Pironneau, O., Le Hyaric, A., Ohtsuka, K.: Freefem++ v. 2.11. Users? Manual University of Paris 6 (2010)
[29] Hennig, P.; Schuler, CJ, Entropy search for information-efficient global optimization, J. Mach. Learn. Res., 13, 1809-1837, (2012) · Zbl 1432.65073
[30] Hernández-Lobato, J.M., Hoffman, M.W., Ghahramani, Z.: Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems, pp. 918-926 (2014)
[31] Hernández-Lobato, JM; Gelbart, MA; Adams, RP; Hoffman, MW; Ghahramani, Z., A general framework for constrained bayesian optimization using information-based search, J. Mach. Learn. Res., 17, 1-53, (2016) · Zbl 1391.90641
[32] Hu, J.; Wellman, MP, Nash q-learning for general-sum stochastic games, J. Mach. Learn. Res., 4, 1039-1069, (2003) · Zbl 1094.68076
[33] Isaacs, R.: Differential Games. A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. Wiley, New York (1965) · Zbl 0125.38001
[34] Jala, M.; Lévy-Leduc, C.; Moulines, É; Conil, E.; Wiart, J., Sequential design of computer experiments for the assessment of fetus exposure to electromagnetic fields, Technometrics, 58, 30-42, (2016)
[35] Johanson, M., Bowling, M.H.: Data biased robust counter strategies. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 264-271 (2009)
[36] Jones, DR; Schonlau, M.; Welch, WJ, Efficient global optimization of expensive black-box functions, J. Glob. Optim., 13, 455-492, (1998) · Zbl 0917.90270
[37] Kanzow, C.; Steck, D., Augmented lagrangian methods for the solution of generalized nash equilibrium problems, SIAM J. Optim., 26, 2034-2058, (2016) · Zbl 1351.65037
[38] Lanctot, M., Burch, N., Zinkevich, M., Bowling, M., Gibson, R.G.: No-regret learning in extensive-form games with imperfect recall. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp. 65-72 (2012)
[39] León, ER; Pape, AL; Désidéri, JA; Alfano, D.; Costes, M., Concurrent aerodynamic optimization of rotor blades using a nash game method, J. Am. Helicopter Soc., 61, 1-13, (2014)
[40] Li, S.; Başar, T., Distributed algorithms for the computation of noncooperative equilibria, Autom. J. IFAC, 23, 523-533, (1987) · Zbl 0619.90092
[41] Littman, ML; Stone, P., A polynomial-time nash equilibrium algorithm for repeated games, Decis. Support Syst., 39, 55-66, (2005)
[42] McKay, MD; Beckman, RJ; Conover, WJ, Comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics, 21, 239-245, (1979) · Zbl 0415.62011
[43] Mockus, J.: Bayesian Approach to Global Optimization: Theory and Applications. Springer, Berlin (1989) · Zbl 0693.49001
[44] Neyman, A., Sorin, S.: Stochastic Games and Applications, vol. 570. Springer, Berlin (2003) · Zbl 1027.00040
[45] Nishimura, R.; Hayashi, S.; Fukushima, M., Robust nash equilibria in n-person non-cooperative games: uniqueness and reformulation, Pac. J. Optim., 5, 237-259, (2009) · Zbl 1162.91304
[46] Parr, J. M.: Improvement Criteria for Constraint Handling and Multiobjective Optimization. Ph.D thesis, University of Southampton (2012)
[47] Picheny, V.: A stepwise uncertainty reduction approach to constrained global optimization. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, vol 33, pp. 787-795 (2014)
[48] Picheny, V., Binois, M.: GPGame: solving complex game problems using Gaussian processes. URL http://CRAN.R-project.org/package=GPGame, r package version 0.1.3 (2017)
[49] Plumlee, M., Fast prediction of deterministic functions using sparse grid experimental designs, J. Am. Stat. Assoc., 109, 1581-1591, (2014) · Zbl 1368.65017
[50] R Core Team (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/. Accessed 15 Mar 2018
[51] Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. MIT Press. http://www.gaussianprocess.org/gpml/ (2006) · Zbl 1177.68165
[52] Rosenmüller, J., On a generalization of the lemke-howson algorithm to noncooperative n-person games, SIAM J. Appl. Math., 21, 73-79, (1971) · Zbl 0222.90053
[53] Roustant, O.; Ginsbourger, D.; Deville, Y., DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization, J. Stat. Softw., 51, 1-55, (2012)
[54] Rullière, D.; Durrande, N.; Bachoc, F.; Chevalier, C., Nested kriging predictions for datasets with a large number of observations, Stat. Comput., 28, 1-19, (2016) · Zbl 1384.62246
[55] Scilab Enterprises (2012) Scilab: Free and Open Source Software for Numerical Computation. Scilab Enterprises, Orsay. http://www.scilab.org. Accessed 1 Apr 2015
[56] Shahriari, B.; Swersky, K.; Wang, Z.; Adams, RP; Freitas, N., Taking the human out of the loop: a review of bayesian optimization, Proc. IEEE, 104, 148-175, (2016)
[57] Shapley, LS, Stochastic games, Proc. Natl. Acad. Sci., 39, 1095-1100, (1953) · Zbl 0051.35805
[58] Srinivas, N.; Krause, A.; Kakade, SM; Seeger, M., Information-theoretic regret bounds for gaussian process optimization in the bandit setting, Inf. Theory IEEE Trans., 58, 3250-3265, (2012) · Zbl 1365.94131
[59] Uryas’ev, S.; Rubinstein, RY, On relaxation algorithms in computation of noncooperative equilibria, IEEE Trans. Autom. Control, 39, 1263-1267, (1994) · Zbl 0811.90117
[60] Villemonteix, J.; Vazquez, E.; Walter, E., An informational approach to the global optimization of expensive-to-evaluate functions, J. Glob. Optim., 44, 509-534, (2009) · Zbl 1180.90253
[61] Wagner, T., Emmerich, M., Deutz, A., Ponweiser, W.: On expected-improvement criteria for model-based multi-objective optimization. In: International Conference on Parallel Problem Solving from Nature, Springer, Berlin. pp. 718-727 (2010)
[62] Wang, G.; Shan, S., Review of metamodeling techniques in support of engineering design optimization, J. Mech. Des., 129, 370, (2007)
[63] Wilson, A., Nickisch, H.: Kernel interpolation for scalable structured Gaussian processes (kiss-gp). In: International Conference on Machine Learning, pp. 1775-1784 (2015)
[64] Žilinskas, A.; Zhigljavsky, A., Stochastic global optimization: a review on the occasion of 25 years of informatica, Informatica, 27, 229-256, (2016) · Zbl 1387.90203
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.