## Elaboration tolerant representation of Markov decision process via decision-theoretic extension of probabilistic action language $$p\mathcal{BC}+$$.(English)Zbl 07390874

Summary: We extend probabilistic action language $$p\mathcal{BC}+$$ with the notion of utility in decision theory. The semantics of the extended $$p\mathcal{BC}+$$ can be defined as a shorthand notation for a decision-theoretic extension of the probabilistic answer set programming language $$\text{LP}^{\text{MLN}}$$. Alternatively, the semantics of $$p\mathcal{BC}+$$ can also be defined in terms of Markov decision process (MDP), which in turn allows for representing MDP in a succinct and elaboration tolerant way as well as leveraging an MDP solver to compute a $$p\mathcal{BC}+$$ action description. The idea led to the design of the system pbcplus2mdp, which can find an optimal policy of a $$p\mathcal{BC}+$$ action description using an MDP solver.

### MSC:

 68N17 Logic programming

### Software:

PEORL; REBA; Smodels; FODD-Planner; CCalc
Full Text:

### References:

 [1] Babb, J. and Lee, J.2015. Action language $${\cal BC}$$+. Journal of Logic and Computation, exv062. · Zbl 1467.68180 [2] Baral, C., Gelfond, M. and Rushton, J. N.2009. Probabilistic reasoning with answer sets. Theory and Practice of Logic Programming9, 1, 57-144. · Zbl 1170.68003 [3] Bellman, R.1957. A Markovian decision process. Indiana University Mathematics Journal6, 679-684. · Zbl 0078.34101 [4] Boutilier, C., Reiter, R. and Price, B.2001. Symbolic dynamic programming for first-order MDPs. In Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 1. IJCAI01, 690-697. [5] Broeck, G. V. d., Thon, I., Otterlo, M. v. and Raedt, L. D. 2010. DTProblog: A decision-theoretic probabilistic prolog. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. AAAI’10. AAAI Press, 1217-1222. [6] Erdem, E., Gelfond, M. and Leone, N.2016. Applications of answer set programming. AI Magazine37, 3, 53-68. [7] Faber, W., Leone, N. and Pfeifer, G.2004. Recursive aggregates in disjunctive logic programs: Semantics and complexity. In Proceedings of European Conference on Logics in Artificial Intelligence (JELIA). · Zbl 1111.68380 [8] Ferraris, P.2005. Answer sets for propositional theories. In Proceedings of International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR), 119-131. · Zbl 1152.68408 [9] Ferreira, L. A., Bianchi, R. A. C., Santos, P. E. and De Mantaras, R. L.2017. Answer set programming for non-stationary markov decision processes. Applied Intelligence47, 4, 993-1007. [10] Gelfond, M. and Lifschitz, V.1993. Representing action and change by logic programs. Journal of Logic Programming17, 301-322. · Zbl 0783.68024 [11] Gelfond, M. and Lifschitz, V.1998. Action languages. Electronic Transactions on Artificial Intelligence3, 195-210. http://www.ep.liu.se/ea/cis/1998/016/. [12] Giunchiglia, E., Lee, J., Lifschitz, V., Mccain, N. and Turner, H.2004. Nonmonotonic causal theories. Artificial Intelligence 153, 1-2, 49-104. · Zbl 1085.68161 [13] Kautz, H. and Selman, B.1998. A general stochastic approach to solving problems with hard and soft constraints. In The Satisfiability Problem: Theory and Applications. · Zbl 0891.68100 [14] Lee, J. and Lifschitz, V.2003. Loop formulas for disjunctive logic programs. In Proceedings of International Conference on Logic Programming (ICLP), 451-465. · Zbl 1204.68056 [15] Lee, J., Lifschitz, V. and Yang, F.2013. Action language $${\cal BC}$$: Preliminary report. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). [16] Lee, J., Talsania, S. and Wang, Y.2017. Computing LPMLN using ASP and MLN solvers. Theory and Practice of Logic Programming17, 5-6, 942-960. · Zbl 1422.68026 [17] Lee, J. and Wang, Y.2016. Weighted rules under the stable model semantics. In Proceedings of International Conference on Principles of Knowledge Representation and Reasoning (KR), 145-154. [18] Lee, J. and Wang, Y.2018. A probabilistic extension of action language BC+. Theory and Practice of Logic Programming18, 3-4, 607-622. · Zbl 1451.68263 [19] Leonetti, M., Iocchi, L. and Stone, P.2016. A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artificial Intelligence241, 103-130. · Zbl 1392.68387 [20] Mccarthy, J.1963. Situations, actions, and causal laws. Tech. rep., Stanford University CA Department of Computer Science. [21] Niemelä, I. and Simons, P.2000. Extending the Smodels system with cardinality and weight constraints. In Logic-Based Artificial Intelligence, Minker, J., Ed. Kluwer, 491-521. · Zbl 0979.68015 [22] Pelov, N., Denecker, M. and Bruynooghe, M.2007. Well-Founded and stable semantics of logic programs with aggregates. Theory and Practice of Logic Programming7, 3, 301-353. · Zbl 1111.68070 [23] Poole, D.2008. The independent choice logic and beyond. In Probabilistic Inductive Logic Programming. Springer, 222-243. · Zbl 1137.68596 [24] Poole, D.2013. A framework for decision-theoretic planning I: Combining the situation calculus, conditional plans, probability and utility. arXiv preprint arXiv:1302.3597. [25] Sanner, S.2010. Relational dynamic influence diagram language (RDDL): Language description. Unpublished ms. Australian National University, 32. [26] Sanner, S. and Boutilier, C.2009. Practical solution techniques for first-order MDPs. Artificial Intelligence173, 5, 748-788. Advances in Automated Plan Generation. · Zbl 1191.68641 [27] Son, T. C., Pontelli, E. and Tu, P. H.2006. Answer sets for logic programs with arbitrary abstract constraint atoms. In Proceedings, The Twenty-First National Conference on Artificial Intelligence (AAAI). · Zbl 1182.68044 [28] Sridharan, M., Gelfond, M., Zhang, S. and Wyatt, J.2019. REBA: A refinement-based architecture for knowledge representation and reasoning in robotics. Journal of Artificial Intelligence Research65, 1, 87-180. · Zbl 1477.68300 [29] Wang, C., Joshi, S. and Khardon, R.2008. First order decision diagrams for relational MDPs. Journal of Artificial Intelligence Research31, 431-472. · Zbl 1182.68271 [30] Wang, Y.2020. ywang485/pbcplus2mdp: pbcplus2mdp v0.1. [31] Wang, Y. and Lee, J.2019. Elaboration tolerant representation of markov decision process via decision theoretic extension of action language pbc+. In Proceedings of the 15th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 2019). · Zbl 07115977 [32] Watkins, C. J. C. H.1989. Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge, UK. [33] Yang, F., Lyu, D., Liu, B. and Gustafson, S.2018. PEORL: Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. In IJCAI, 4860-4866. [34] Yoon, S., Fern, A. and Givan, R.2002. Inductive policy selection for first-order MDPs. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence. UAI02. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 568-576. [35] Younes, H. L. and Littman, M. L.2004. PPDDL1.0: An extension to PDDL for expressing planning domains with probabilistic effects. Techn. Rep. CMU-CS-04-162. [36] Zhang, S. and Stone, P.2015. CORPP: Commonsense reasoning and probabilistic planning, as applied to dialog with a mobile robot. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI’15. AAAI Press, 1394-1400.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.