R-MAX swMATH ID: 2539 Software Authors: Brafman, Ronen I.; Tennenholtz, Moshe Description: R-MAX is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-MAX, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent’s observations. R-MAX improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh’s E^3 algorithm, covering zero-sum stochastic games. (2) It has a built-in mechanism for resolving the exploration vs. exploitation dilemma. (3) It formally justifies the “optimism under uncertainty” bias used in many RL algorithms. (4) It is simpler, more general, and more efficient than Brafman and Tennenholtz’s LSG algorithm for learning in single controller stochastic games. (5) It generalizes the algorithm by Monderer and Tennenholtz for learning in repeated games. (6) It is the only algorithm for learning in repeated games, to date, which is provably efficient, considerably improving and simplifying previous algorithms by Banos and by Megiddo. Homepage: http://jmlr.csail.mit.edu/papers/v3/brafman02a.html Keywords: Reinforcement Learning; Learning in Games; Stochastic Games; Markov Decision Processes; Provably Efficient Learning Related Software: AWESOME; AdaBoost.MH; PRMLT; ElemStatLearn; POMDPS; GAMUT Cited in: 32 Publications Standard Articles 1 Publication describing the Software, including 1 Publication in zbMATH Year R-MAX – a general polynomial time algorithm for near-optimal reinforcement learning. Zbl 1088.68694Brafman, Ronen I.; Tennenholtz, Moshe 2003 all top 5 Cited by 46 Authors 5 Brafman, Ronen I. 5 Littman, Michael L. 4 Li, Lihong 4 Mannor, Shie 4 Tennenholtz, Moshe 3 Hutter, Marcus 3 Strehl, Alexander L. 2 Bab, Avraham 2 Crandall, Jacob W. 2 Ryabko, Daniil 2 Shimkin, Nahum 2 Szepesvári, Csaba 1 Albrecht, Stefano V. 1 Auer, Peter 1 Bartlett, Peter L. 1 Braun, Daniel A. 1 Brunskill, Emma 1 Cao, Weihua 1 Chen, Gang 1 Chen, Xin 1 Even-Dar, Eyal 1 Farahmand, Amir-Massoud 1 Främling, Kary 1 Goodrich, Michael A. 1 Grenager, Trond 1 Hans, Alexander 1 Jaksch, Thomas 1 Leffler, Bethany R. 1 Mansour, Yishay 1 Ng, Kee Siong 1 Ortega, Pedro A. 1 Ortner, Ronald 1 Powers, Rob 1 Ramamoorthy, Subramanian 1 Roy, Nicholas 1 Sandholm, Tuomas W. 1 Shoham, Yoav 1 Silver, David M. 1 Tewari, Ambuj 1 Udluft, Steffen 1 Uther, William T. B. 1 Veness, Joel 1 Walsh, Thomas J. 1 Whiteson, Shimon 1 Wu, Min 1 Yu, Jiayuan all top 5 Cited in 13 Serials 6 Journal of Machine Learning Research (JMLR) 4 Artificial Intelligence 3 Machine Learning 3 The Journal of Artificial Intelligence Research (JAIR) 1 Journal of Computer and System Sciences 1 Mathematics of Operations Research 1 Theoretical Computer Science 1 Neural Networks 1 Artificial Intelligence Review 1 Annals of Mathematics and Artificial Intelligence 1 Journal of Control Theory and Applications 1 Studies in Computational Intelligence 1 Synthesis Lectures on Artificial Intelligence and Machine Learning all top 5 Cited in 6 Fields 30 Computer science (68-XX) 11 Game theory, economics, finance, and other social and behavioral sciences (91-XX) 10 Operations research, mathematical programming (90-XX) 4 Statistics (62-XX) 2 Systems theory; control (93-XX) 1 Information and communication theory, circuits (94-XX) Citations by Year