×

zbMATH — the first resource for mathematics

Multi-cost bounded tradeoff analysis in MDP. (English) Zbl 07268911
Summary: We provide a memory-efficient algorithm for multi-objective model checking problems on Markov decision processes (MDPs) with multiple cost structures. The key problem at hand is to check whether there exists a scheduler for a given MDP such that all objectives over cost vectors are fulfilled. We cover multi-objective reachability and expected cost objectives, and combinations thereof. We further transfer approaches for computing quantiles over single cost bounds to the multi-cost case and highlight the ensuing challenges. An empirical evaluation shows the scalability of our new approach both in terms of memory consumption and runtime. We discuss the need for more detailed visual presentations of results beyond Pareto curves and present a first visualisation approach that exploits all the available information from the algorithm to support decision makers.
MSC:
68Q60 Specification and verification (program logics, model checking, etc.)
68Q87 Probability in computer science (algorithm analysis, random structures, phase transitions, etc.)
90C29 Multi-objective and goal programming
90C40 Markov and semi-Markov decision processes
Software:
PRISM; Storm
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Andova, S., Hermanns, H., Katoen, J.P.: Discrete-time rewards model-checked. In: FORMATS, LNCS, vol. 2791, pp. 88-104. Springer (2003) · Zbl 1099.68652
[2] Baier, C., Daum, M., Dubslaff, C., Klein, J., Klüppelholz, S.: Energy-utility quantiles. In: NFM, LNCS, vol. 8430, pp. 285-299. Springer (2014)
[3] Baier, C.; Dubslaff, C., From verification to synthesis under cost-utility constraints, SIGLOG News, 5, 4, 26-46 (2018)
[4] Baier, C., Klein, J., Klüppelholz, S., Wunderlich, S.: Maximizing the conditional expected reward for reaching the goal. In: TACAS (2), LNCS, vol. 10206, pp. 269-285 (2017)
[5] Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: Interval iteration for Markov decision processes. In: CAV (1), LNCS, vol. 10426, pp. 160-180. Springer (2017)
[6] Barrett, L., Narayanan, S.: Learning all optimal policies with multiple criteria. In: ICML, AICPS, vol. 307, pp. 41-47. ACM (2008)
[7] Berthon, R., Randour, M., Raskin, J.F.: Threshold constraints with guarantees for parity objectives in Markov decision processes. In: ICALP, LIPIcs, vol. 80, pp. 121:1-121:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017) · Zbl 1442.90199
[8] Brázdil, T., Brozek, V., Chatterjee, K., Forejt, V., Kucera, A.: Two views on multiple mean-payoff objectives in Markov decision processes. LMCS 10(1) (2014) · Zbl 1326.90101
[9] Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: ATVA, LNCS, vol. 8837, pp. 98-114. Springer (2014) · Zbl 1448.68290
[10] Brázdil, T.; Chatterjee, K.; Forejt, V.; Kucera, A., Trading performance for stability in Markov decision processes, J. Comput. Syst. Sci., 84, 144-170 (2017) · Zbl 1359.90147
[11] Bresina, J.L., Jónsson, A.K., Morris, P.H., Rajan, K.: Activity planning for the Mars exploration rovers. In: ICAPS, pp. 40-49. AAAI (2005)
[12] Bryce, D., Cushing, W., Kambhampati, S.: Probabilistic planning is multi-objective. Technical Report, Arizona State Univ, CSE (2007) · Zbl 1216.68239
[13] Cao, Z., Guo, H., Zhang, J., Oliehoek, F.A., Fastenrath, U.: Maximizing the probability of arriving on time: a practical q-learning method. In: AAAI, pp. 4481-4487. AAAI Press (2017)
[14] Chatterjee, K.; Chmelik, M.; Gupta, R.; Kanodia, A., Optimal cost almost-sure reachability in POMDPs, Artif. Intell., 234, 26-48 (2016) · Zbl 1351.68307
[15] Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: STACS, LNCS, vol. 3884, pp. 325-336. Springer (2006) · Zbl 1136.90498
[16] Chen, T., Forejt, V., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: On stochastic games with multiple objectives. In: MFCS, LNCS, vol. 8087, pp. 266-277. Springer (2013) · Zbl 1400.91040
[17] Cheng, L.; Subrahmanian, E.; Westerberg, AW, Multiobjective decision processes under uncertainty: applications, problem formulations, and solution strategies, Ind. Eng. Chem. Res., 44, 8, 2405-2415 (2005)
[18] Christman, A., Cassamano, J.: Maximizing the probability of arriving on time. In: ASMTA, LNCS, vol. 7984, pp. 142-157. Springer (2013) · Zbl 1395.90056
[19] Dai, P., Mausam, Weld, D.S., Goldsmith, J.: Topological value iteration algorithms. J. JAIR 42, 181-209 (2011) · Zbl 1279.90183
[20] Dehnert, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: Storm website (2018). http://stormchecker.org
[21] Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A Storm is coming: a modern probabilistic model checker. In: CAV (2), LNCS, vol. 10427, pp. 592-600. Springer (2017)
[22] Eastwood, R., Alexander, R., Kelly, T.: Safe multi-objective planning with a posteriori preferences. In: HASE, pp. 78-85. IEEE Computer Society (2016)
[23] Etessami, K., Kwiatkowska, M., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. LMCS 4(4) (2008) · Zbl 1161.68565
[24] Feng, L., Wiltsche, C., Humphrey, L., Topcu, U.: Controller synthesis for autonomous systems interacting with human operators. In: ICCPS, pp. 70-79. ACM (2015)
[25] Forejt, V., Kwiatkowska, M., Parker, D.: Pareto curves for probabilistic model checking. In: ATVA, LNCS, vol. 7561, pp. 317-332. Springer (2012) · Zbl 1374.68285
[26] Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: SFM, LNCS, vol. 6659, pp. 53-113. Springer (2011) · Zbl 1315.68177
[27] Haddad, S., Monmege, B.: Reachability in MDPs: refining convergence of value iteration. In: RP, LNCS, vol. 8762, pp. 125-137. Springer (2014) · Zbl 1393.68102
[28] Hahn, EM; Hartmanns, A., A comparison of time- and reward-bounded probabilistic model checking techniques, SETTA, LNCS, 9984, 85-100 (2016) · Zbl 1393.68104
[29] Hahn, EM; Hartmanns, A.; Hermanns, H.; Katoen, JP, A compositional modelling and analysis framework for stochastic hybrid systems, Formal Methods in Syst. Des., 43, 2, 191-232 (2013) · Zbl 1291.68293
[30] Hahn, E.M., Hashemi, V., Hermanns, H., Lahijanian, M., Turrini, A.: Multi-objective robust strategy synthesis for interval Markov decision processes. In: QEST, LNCS, vol. 10503, pp. 207-223. Springer (2017) · Zbl 1421.90160
[31] Hartmanns, A., Hermanns, H.: The Modest Toolset: An integrated environment for quantitative modelling and verification. In: TACAS, LNCS, vol. 8413, pp. 593-598. Springer (2014)
[32] Hartmanns, A., Junges, S., Katoen, J.P., Quatmann, T.: Multi-cost bounded reachability in MDP. In: TACAS, LNCS, vol. 10806, pp. 320-339. Springer (2018). 10.1007/978-3-319-89963-3_19 · Zbl 1423.68282
[33] Hartmanns, A., Junges, S., Katoen, J.P., Quatmann, T.: Multi-cost bounded tradeoff analysis in MDP—Artifact. Zenodo (2020). 10.5281/zenodo.3894716 · Zbl 1423.68282
[34] Hartmanns, A., Kaminski, B.L.: Optimistic value iteration. In: CAV, Lecture Notes in Computer Science, vol 12225, pp. 488-511. Springer (2020). doi:10.1007/978-3-030-53291-8_26
[35] Hou, P., Yeoh, W., Varakantham, P.: Revisiting risk-sensitive MDPs: New algorithms and results. In: ICAPS. AAAI (2014)
[36] Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.: Safety-constrained reinforcement learning for MDPs. In: TACAS, LNCS, vol. 9636, pp. 130-146. Springer (2016)
[37] Klein, J., Baier, C., Chrszon, P., Daum, M., Dubslaff, C., Klüppelholz, S., Märcker, S., Müller, D.: Advances in probabilistic model checking with PRISM: variable reordering, quantiles and weak deterministic Büchi automata. STTT pp. 1-16 (2017)
[38] Kolobov, A., Mausam, Weld, D.S.: A theory of goal-oriented MDPs with dead ends. In: UAI, pp. 438-447. AUAI Press (2012)
[39] Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: CAV, LNCS, vol. 6806, pp. 585-591. Springer (2011)
[40] Kwiatkowska, M., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST, pp. 203-204. IEEE CS Press (2012)
[41] Kwiatkowska, M.Z., Norman, G., Sproston, J.: Probabilistic model checking of the IEEE 802.11 wireless local area network protocol. In: PAPM-PROBMIV, LNCS, vol. 2399, pp. 169-187. Springer (2002) · Zbl 1065.68583
[42] Lacerda, B., Parker, D., Hawes, N.: Multi-objective policy generation for mobile robots under probabilistic time-bounded guarantees. In: ICAPS, pp. 504-512. AAAI Press (2017)
[43] Lankaites Pinheiro, R.; Landa-Silva, D.; Atkin, J., A technique based on trade-off maps to visualise and analyse relationships between objectives in optimisation problems, J. Multi-Criteria Decis. Anal., 24, 1-2, 37-56 (2017)
[44] Laroussinie, F., Sproston, J.: Model checking durational probabilistic systems. In: FoSSaCS, LNCS, vol. 3441, pp. 140-154. Springer (2005) · Zbl 1118.68548
[45] Norman, G.; Parker, D.; Kwiatkowska, MZ; Shukla, SK, Evaluating the reliability of NAND multiplexing with PRISM, IEEE Trans. CAD of Integ. Circuits Syst., 24, 10, 1629-1637 (2005)
[46] Puterman, ML, Markov Decision Processes (1994), HobokenD: Wiley, HobokenD
[47] Quatmann, T., Junges, S., Katoen, J.P.: Markov automata with multiple objectives. In: CAV (1), LNCS, vol. 10426, pp. 140-159. Springer (2017)
[48] Quatmann, T., Katoen, J.P.: Sound value iteration. In: CAV, LNCS, vol. 10981, pp. 643-661. Springer (2018)
[49] Randour, M.; Raskin, JF; Sankur, O., Percentile queries in multi-dimensional Markov decision processes, FMSD, 50, 2-3, 207-248 (2017) · Zbl 1360.68518
[50] Reiter, MK; Rubin, AD, Crowds: anonymity for web transactions, ACM Trans. Inf. Syst. Secur., 1, 1, 66-92 (1998)
[51] Roijers, DM; Vamplew, P.; Whiteson, S.; Dazeley, R., A survey of multi-objective sequential decision-making, J. Artif. Intell. Res., 48, 67-113 (2013) · Zbl 1364.68323
[52] Sardar, M.U., Dubslaff, C., Klüppelholz, S., Baier, C., Kumar, A.: Performance evaluation of thermal-constrained scheduling strategies in multi-core systems. In: EPEW, LNCS, vol. 12039, pp. 133-147. Springer (2019). 10.1007/978-3-030-44411-2_9
[53] Steinmetz, M.; Hoffmann, J.; Buffet, O., Goal probability analysis in probabilistic planning: exploring and enhancing the state of the art, J. Artif. Intell. Res., 57, 229-271 (2016) · Zbl 1401.68294
[54] Stoelinga, M., Vaandrager, F.W.: Root contention in IEEE 1394. In: ARTS Formal Methods for Real-Time and Probabilistic Systems, LNCS, vol. 1601, pp. 53-74. Springer (1999)
[55] Teichteil-Königsbuch, F.: Stochastic safest and shortest path problems. In: AAAI. AAAI Press (2012) · Zbl 1327.68289
[56] The International Probabilistic Planning Competition. http://www.icaps-conference.org/index.php/Main/Competitions
[57] Ummels, M., Baier, C.: Computing quantiles in Markov reward models. In: FOSSACS, LNCS, vol. 7794, pp. 353-368. Springer (2013) · Zbl 1260.68285
[58] Vamplew, P.; Dazeley, R.; Berry, A.; Issabekov, R.; Dekker, E., Empirical evaluation methods for multiobjective reinforcement learning algorithms, Mach. Learn., 84, 1-2, 51-80 (2011)
[59] Yu, SX; Lin, Y.; Yan, P., Optimization models for the first arrival target distribution function in discrete time, J. Math. Anal. Appl., 225, 1, 193-223 (1998) · Zbl 0924.90133
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.