×

Long-term values in Markov decision processes, (co)algebraically. (English) Zbl 1519.68147

Cîrstea, Corina (ed.), Coalgebraic methods in computer science. 14th IFIP WG 1.3 international workshop, CMCS 2018, colocated with ETAPS 2018, Thessaloniki, Greece, April 14–15, 2018. Revised selected papers. Cham: Springer. Lect. Notes Comput. Sci. 11202, 78-99 (2018).
Summary: This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s fixpoint theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.
For the entire collection see [Zbl 1396.68009].

MSC:

68Q65 Abstract data types; algebraic specification
18C50 Categorical semantics of formal languages
90C40 Markov and semi-Markov decision processes
PDFBibTeX XMLCite
Full Text: DOI HAL

References:

[1] Abramsky, S.; Winschel, V., Coalgebraic analysis of subgame-perfect equilibria in infinite games without discounting, Math. Struct. Comput. Sci., 27, 5, 751-761 (2017) · Zbl 1364.91026 · doi:10.1017/S0960129515000365
[2] Baldan, P., Bonchi, F., Kerstan, H., König, B.: Behavioral metrics via functor lifting. In: 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2014, pp. 403-415 (2014) · Zbl 1360.68607
[3] Bartels, F.: On Generalised Coinduction and Probabilistic Specification Formats. Ph.D. thesis, Vrije Universiteit Amsterdam (2004)
[4] Bellman, R., Dynamic Programming (1957), Princeton: Princeton University Press, Princeton · Zbl 0077.13605
[5] Capretta, V.; Uustalu, T.; Vene, V., Recursive coalgebras from comonads, Inf. Comp., 204, 437-468 (2006) · Zbl 1110.68068 · doi:10.1016/j.ic.2005.08.005
[6] Capretta, V.; Uustalu, T.; Vene, V.; Oliveira, MVM; Woodcock, J., Corecursive algebras: a study of general structured corecursion, Formal Methods: Foundations and Applications, 84-100 (2009), Heidelberg: Springer, Heidelberg · Zbl 1266.68083 · doi:10.1007/978-3-642-10452-7_7
[7] Denardo, EV, Contraction mappings in the theory underlying dynamic programming, SIAM Rev., 9, 2, 165-177 (1967) · Zbl 0154.45101 · doi:10.1137/1009030
[8] Desharnais, J.; Edalat, A.; Panangaden, P., Bisimulation for labelled markov processes, Inf. Comput., 179, 2, 163-193 (2002) · Zbl 1096.68103 · doi:10.1006/inco.2001.2962
[9] Ferns, N., Panangaden, P., Precup, D.: Metrics for finite markov decision processes. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 162-169. AUAI Press, Arlington (2004). http://dl.acm.org/citation.cfm?id=1036843.1036863 · Zbl 1253.39018
[10] Gibbs, AL; Su, FE, On choosing and bounding probability metrics, Int. Stat. Rev./Revue Internationale de Statistique, 70, 3, 419-435 (2002) · Zbl 1217.62014
[11] Giry, M.; Banaschewski, B., A categorical approach to probability theory, Categorical Aspects of Topology and Analysis, 68-85 (1982), Heidelberg: Springer, Heidelberg · Zbl 0486.60034 · doi:10.1007/BFb0092872
[12] Howard, RA, Dynamic Programming and Markov Processes (1960), Cambridge, MA: The M.I.T. Press, Cambridge, MA · Zbl 0091.16001
[13] Jacobs, B., Distributive laws for the coinductive solution of recursive equations, Inf. Comput., 204, 4, 561-587 (2006) · Zbl 1110.68069 · doi:10.1016/j.ic.2005.03.006
[14] Jacobs, B.; Silva, A.; Sokolova, A., Trace semantics via determinization, J. Comput. Syst. Sci., 81, 5, 859-879 (2015) · Zbl 1327.68158 · doi:10.1016/j.jcss.2014.12.005
[15] Johnstone, PT, Adjoint lifting theorems for categories of algebras, Bull. Lond. Math. Soc., 7, 294-297 (1975) · Zbl 0315.18004 · doi:10.1112/blms/7.3.294
[16] Klin, B., Bialgebras for structural operational semantics: an introduction, Theor. Comput. Sci., 412, 38, 5043-5069 (2011) · Zbl 1246.68150 · doi:10.1016/j.tcs.2011.03.023
[17] Kozen, D., Coinductive proof principles for stochastic processes, Log. Methods Comput. Sci., 5, 1-19 (2009) · Zbl 1187.03028 · doi:10.2168/LMCS-5(3:10)2009
[18] Mac Lane, S., Categories for the Working Mathematician (1978), New York: Springer, New York · Zbl 0232.18001 · doi:10.1007/978-1-4757-4721-8
[19] Milius, S., Completely iterative algebras and completely iterative monads, Inf. Comput., 196, 1, 1-41 (2005) · Zbl 1062.68075 · doi:10.1016/j.ic.2004.05.003
[20] Moore, A.W.: Markov Systems, Markov Decision Processes, and Dynamic Programming (2002). lecture slides available at https://www.autonlab.org/tutorials
[21] Ó’Searcóid, M., Metric Spaces (2006), London: Springer, London · Zbl 1109.54001
[22] Pavlovic, D.; Kurz, A.; Lenisa, M.; Tarlecki, A., A semantical approach to equilibria and rationality, Algebra and Coalgebra in Computer Science, 317-334 (2009), Heidelberg: Springer, Heidelberg · Zbl 1239.68027 · doi:10.1007/978-3-642-03741-2_22
[23] Puterman, ML, Markov Decision Processes: Discrete Stochastic Dynamic Programming (2014), Hoboken: Wiley, Hoboken · Zbl 0829.90134
[24] Ruozzi, N., Kozen, D.: Applications of metric coinduction. Logical Methods in Computer Science, 5 (2009) · Zbl 1187.03028
[25] Rutten, J., Universal coalgebra: a theory of systems, Theor. Comput. Sci., 249, 1, 3-80 (2000) · Zbl 0951.68038 · doi:10.1016/S0304-3975(00)00056-6
[26] Silva, A.; Bonchi, F.; Bonsangue, M.; Rutten, J., Generalizing determinization from automata to coalgebras, Log. Methods Comput. Sci., 9, 1-27 (2013) · Zbl 1262.18002 · doi:10.2168/LMCS-9(1:9)2013
[27] Silva, A.; Sokolova, A., Sound and complete axiomatization of trace semantics for probabilistic systems, Electron. Notes Theor. Comput. Sci., 276, 291-311 (2011) · Zbl 1342.68243 · doi:10.1016/j.entcs.2011.09.027
[28] Sokolova, A., Probabilistic systems coalgebraically, Theor. Comput. Sci., 412, 38, 5095-5110 (2011) · Zbl 1234.68308 · doi:10.1016/j.tcs.2011.05.008
[29] Villani, C., Optimal Transport, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] (2009), Heidelberg: Springer, Heidelberg · Zbl 1156.53003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.