×

A Bayesian approach for learning and planning in partially observable Markov decision processes. (English) Zbl 1280.68193

Summary: Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov decision processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-adaptive partially observable Markov decision processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent’s return improve as a function of experience.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62F15 Bayesian inference
62B10 Statistical aspects of information-theoretic topics
90C40 Markov and semi-Markov decision processes
PDFBibTeX XMLCite
Full Text: Link