zbMATH — the first resource for mathematics

Of cores: a partial-exploration framework for Markov decision processes. (English) Zbl 07269250
Summary: We introduce a framework for approximate analysis of Markov decision processes (MDP) with bounded-, unbounded-, and infinite-horizon properties. The main idea is to identify a core of an MDP, i.e., a subsystem where we provably remain with high probability, and to avoid computation on the less relevant rest of the state space. Although we identify the core using simulations and statistical techniques, it allows for rigorous error bounds in the analysis. Consequently, we obtain efficient analysis algorithms based on partial exploration for various settings, including the challenging case of strongly connected systems.
03B70 Logic in computer science
68 Computer science
Full Text: Link arXiv
[1] Pranav Ashok, Yuliya Butkova, Holger Hermanns, and Jan Křetínsk‘y. Continuous-time Markov decisions based on partial exploration. InATVA, pages 317-334. Springer, 2018.
[2] Richard Bellman. A Markovian decision process.Journal of Mathematics and Mechanics, pages 679-684, 1957. · Zbl 0078.34101
[3] Dimitri P. Bertsekas.Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming. Athena Scientific, 2012.
[4] Christel Baier and Joost-Pieter Katoen.Principles of Model Checking. MIT Press, 2008. · Zbl 1179.68076
[5] Krishnendu Chatterjee and Monika Henzinger. Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. InSODA, pages 1318-1336, 2011. · Zbl 1374.68272
[6] Krishnendu Chatterjee and Monika Henzinger. AnO(n2) time algorithm for alternating büchi games. InSODA, pages 1386-1399. SIAM, 2012. · Zbl 1421.68110
[7] Krishnendu Chatterjee and Monika Henzinger. Efficient and dynamic algorithms for alternating büchi games and maximal end-component decomposition.J. ACM, 61(3):15:1-15:40, 2014. · Zbl 1295.91019
[8] Krishnendu Chatterjee, Petr Novotný, and Dorde Zikelic. Stochastic invariants for probabilistic termination. In Giuseppe Castagna and Andrew D. Gordon, editors,Proceedings of the 44th ACM · Zbl 1380.68114
[9] Costas Courcoubetis and Mihalis Yannakakis. The complexity of probabilistic verification.J. ACM, 42(4):857-907, 1995. · Zbl 0885.68109
[10] Luca De Alfaro.Formal verification of probabilistic systems. Number 1601. Citeseer, 1997.
[11] Pedro R. D’Argenio, Bertrand Jeannet, Henrik Ejersbo Jensen, and Kim Guldstrand Larsen. Reduction and refinement strategies for probabilistic analysis. InPAPM-PROBMIV, pages 57-76. Springer, 2002. · Zbl 1065.68582
[12] Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen, and Matthias Volk. A storm is coming: A modern probabilistic model checker. InCAV, pages 592-600. Springer, 2017.
[13] Ernst Moritz Hahn, Holger Hermanns, Björn Wachter, and Lijun Zhang. PASS: abstraction refinement for infinite probabilistic models. InTACAS, pages 353-357. Springer, 2010.
[14] Arnd Hartmanns and Benjamin Lucien Kaminski. Optimistic value iteration.CoRR, abs/1910.01100, 2019.
[15] Serge Haddad and Benjamin Monmege. Reachability in MDPs: Refining convergence of value iteration. InInternational Workshop on Reachability Problems, pages 125-137. Springer, 2014. · Zbl 1393.68102
[16] Edon Kelmendi, Julia Krämer, Jan Kretínský, and Maximilian Weininger. Value iteration for simple stochastic games: Stopping criterion and learning algorithm. In Hana Chockler and Georg Weissenbacher, editors,Computer Aided Verification - 30th International Conference,
[17] Jan Kretínský and Tobias Meggendorfer. Of cores: A partial-exploration framework for markov decision processes. In Wan Fokkink and Rob van Glabbeek, editors,30th International Conference · Zbl 1437.68118
[18] M. Kwiatkowska, G. Norman, and D. Parker. PRISM: Probabilistic symbolic model checker. In TOOLS, pages 200-204, 2002. · Zbl 1047.68533
[19] Marta Z. Kwiatkowska, Gethin Norman, and David Parker. The PRISM benchmark suite. In QEST, pages 203-204. IEEE Computer Society, 2012. · Zbl 1047.68533
[20] Marta Kwiatkowska, Gethin Norman, David Parker, and Jeremy Sproston. Performance analysis of probabilistic timed automata using digital clocks.FMSD, 29(1):33-78, 2006. · Zbl 1105.68063
[21] Marta Kwiatkowska, Gethin Norman, and Jeremy Sproston. Probabilistic model checking of the IEEE 802.11 wireless local area network protocol. InProcess Algebra and Probabilistic Methods: · Zbl 1065.68583
[22] H. Brendan McMahan, Maxim Likhachev, and Geoffrey J. Gordon. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. InICML, pages 569-576, 2005.
[23] M.L. Puterman.Markov decision processes: Discrete stochastic dynamic programming. John Wiley and Sons, 1994. · Zbl 0829.90134
[24] Tim Quatmann and Joost-Pieter Katoen. Sound value iteration. InCAV (1), volume 10981 of LNCS, pages 643-661. Springer, 2018.
[25] Aaron Sidford, Mengdi Wang, Xian Wu, and Yinyu Ye. Variance reduced value iteration and faster algorithms for solving Markov decision processes. InSODA, pages 770-787. SIAM, 2018. · Zbl 1403.68386
[26] O. Tange. Gnu parallel - the command-line power tool.;login: The USENIX Magazine, 36(1):42- 47, Feb 2011.
[27] Robert Tarjan. Depth-first search and linear graph algorithms.SICOMP, 1(2):146-160, 1972. · Zbl 0251.05107
[28] Douglas J White. Further real applications of markov decision processes.Interfaces, 18(5):55-61, 1988.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.