RMAX
swMATH ID:  2539 
Software Authors:  Brafman, Ronen I.; Tennenholtz, Moshe 
Description:  RMAX is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In RMAX, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent’s observations. RMAX improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh’s E^3 algorithm, covering zerosum stochastic games. (2) It has a builtin mechanism for resolving the exploration vs. exploitation dilemma. (3) It formally justifies the “optimism under uncertainty” bias used in many RL algorithms. (4) It is simpler, more general, and more efficient than Brafman and Tennenholtz’s LSG algorithm for learning in single controller stochastic games. (5) It generalizes the algorithm by Monderer and Tennenholtz for learning in repeated games. (6) It is the only algorithm for learning in repeated games, to date, which is provably efficient, considerably improving and simplifying previous algorithms by Banos and by Megiddo. 
Homepage:  http://jmlr.csail.mit.edu/papers/v3/brafman02a.html 
Keywords:  Reinforcement Learning; Learning in Games; Stochastic Games; Markov Decision Processes; Provably Efficient Learning 
Related Software:  AWESOME; AdaBoost.MH; PRMLT; ElemStatLearn; POMDPS; GAMUT 
Cited in:  32 Publications 
Standard Articles
1 Publication describing the Software, including 1 Publication in zbMATH  Year 

RMAX – a general polynomial time algorithm for nearoptimal reinforcement learning. Zbl 1088.68694 Brafman, Ronen I.; Tennenholtz, Moshe 
2003

all
top 5
Cited by 46 Authors
all
top 5
Cited in 13 Serials
all
top 5