Murphy, Susan A. A generalization error for Q-learning. (English) Zbl 1222.68271 J. Mach. Learn. Res. 6, 1073-1097 (2005). Summary: Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function. Cited in 37 Documents MSC: 68T05 Learning and adaptive systems in artificial intelligence Keywords:multistage decisions; dynamic programming; reinforcement learning; batch data PDFBibTeX XMLCite \textit{S. A. Murphy}, J. Mach. Learn. Res. 6, 1073--1097 (2005; Zbl 1222.68271) Full Text: Link