×

Relational reinforcement learning for planning with exogenous effects. (English) Zbl 1434.68432

Summary: Probabilistic planners have improved recently to the point that they can solve difficult tasks with complex and expressive models. In contrast, learners cannot tackle yet the expressive models that planners do, which forces complex models to be mostly handcrafted. We propose a new learning approach that can learn relational probabilistic models with both action effects and exogenous effects. The proposed learning approach combines a multi-valued variant of inductive logic programming for the generation of candidate models, with an optimization method to select the best set of planning operators to model a problem. We also show how to combine this learner with reinforcement learning algorithms to solve complete problems. Finally, experimental validation is provided that shows improvements over previous work in both simulation and a robotic task. The robotic task involves a dynamic scenario with several agents where a manipulator robot has to clear the tableware on a table. We show that the exogenous effects learned by our approach allowed the robot to clear the table in a more efficient way.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68N17 Logic programming
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
68T40 Artificial intelligence for robotics

Software:

TEXPLORE; TRAMP
PDF BibTeX XML Cite
Full Text: Link

References:

[1] Alejandro Agostini, Carme Torras, and Florentin W¨org¨otter. Efficient interactive decisionmaking framework for robotic applications. Artificial Intelligence, 247:187–212, 2017.
[2] Eyal Amir and Allen Chang. Learning partially observable deterministic action models. Journal of Artificial Intelligence Research, 33:349–402, 2008. · Zbl 1183.68565
[3] Ronen I Brafman and Moshe Tennenholtz. R-max - A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213–231, 2003. · Zbl 1088.68694
[4] Sonia Chernova and Manuela Veloso. Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34(1):1–25, 2009. · Zbl 1182.68161
[5] Ashwin Deshpande, Brian Milch, Luke S Zettlemoyer, and Leslie Pack Kaelbling. Learning probabilistic relational dynamics for multiple tasks. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, pages 83–92, 2007.
[6] Carlos Diuk, Andre Cohen, and Michael L Littman. An object-oriented representation for efficient reinforcement learning. In Proceedings of the International Conference on Machine Learning, pages 240–247, 2008.
[7] Saˇso Dˇzeroski, Luc De Raedt, and Kurt Driessens. Relational reinforcement learning. Machine Learning, 43(1-2):7–52, 2001. · Zbl 0988.68088
[8] Floriana Esposito, Stefano Ferilli, Nicola Fanizzi, Teresa Maria Altomare Basile, and Nicola Di Mauro. Incremental learning and concept drift in INTHELEX. Intelligent Data Analysis, 8(3):213–237, 2004. · Zbl 1049.68740
[9] Daniel H Grollman and Odest Chadwicke Jenkins. Dogged learning for robots. In Proceedings of the International Conference on Robotics and Automation, pages 2483–2488, 2007.
[10] Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006. · Zbl 1182.68245
[11] Todd Hester and Peter Stone. TEXPLORE: real-time sample-efficient reinforcement learning for robots. Machine Learning, 90(3):385–429, 2013.
[12] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963. · Zbl 0127.10602
[13] J¨org Hoffmann and Bernhard Nebel. The FF planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research, pages 253–302, 2001. · Zbl 0970.68044
[14] Katsumi Inoue, Tony Ribeiro, and Chiaki Sakama. Learning from interpretation transition. Machine Learning, 94(1):51–79, 2014. 41 · Zbl 1319.68054
[15] Sergio Jim´enez, Fernando Fern´andez, and Daniel Borrajo. The PELA architecture: integrating planning and learning to improve execution. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1294–1299, 2008.
[16] Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209–232, 2002. · Zbl 1014.68071
[17] Thomas Keller and Patrick Eyerich. PROST: Probabilistic Planning Based on UCT. In Proceedings of the International Conference on Automated Planning and Scheduling, pages 119–127, June 2012.
[18] W Bradley Knox and Peter Stone. Reinforcement learning from simultaneous human and MDP reward. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pages 475–482, 2012.
[19] Andrey Kolobov, Peng Dai, Mausam, and Daniel S Weld. Reverse iterative deepening for finite-horizon MDPs with large branching factors. In Proceedings of the International Conference on Automated Planning and Scheduling, pages 146–154, 2012.
[20] George Konidaris, Ilya Scheidwasser, and Andrew G Barto.Transfer in reinforcement learning via shared features. The Journal of Machine Learning Research, 13(1):1333– 1371, 2012. · Zbl 1303.68106
[21] Johannes Kulick, Marc Toussaint, Tobias Lang, and Manuel Lopes. Active learning for teaching a robot grounded relational symbols. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pages 1451–1457, 2013.
[22] Tobias Lang and Marc Toussaint. Planning with noisy probabilistic relational rules. Journal of Artificial Intelligence Research, 39:1–49, 2010. · Zbl 1205.68379
[23] Tobias Lang, Marc Toussaint, and Kristian Kersting. Exploration in relational domains for model-based reinforcement learning. Journal of Machine Learning Research, 13:3691– 3734, 2012. · Zbl 1433.68360
[24] Lihong Li, Michael L Littman, Thomas J Walsh, and Alexander L Strehl. Knows what it knows: a framework for self-aware learning. Machine Learning, 82(3):399–443, 2011. · Zbl 1237.68154
[25] Iain Little and Sylvie Thiebaux. Probabilistic planning vs. replanning. In Proceedings of the ICAPS Workshop on IPC: Past, Present and Future, 2007.
[26] Michael L Littman. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521(7553):445–451, 2015.
[27] David Mart´ınez, Guillem Aleny‘a, and Carme Torras. Planning robot manipulation to clean planar surfaces. Engineering Applications of Artificial Intelligence, 39:23–32, 2015a.
[28] David Mart´ınez, Guillem Aleny‘a, and Carme Torras. V-MIN: Efficient reinforcement learning through demonstrations and relaxed reward demands. In Proceedings of The AAAI Conference on Artificial Intelligence, pages 2857–2863, 2015b. 42
[29] David Mart´ınez, Tony Ribeiro, Katsumi Inoue, Guillem Aleny‘a, and Carme Torras. Learning probabilistic action models from interpretation transitions. In Technical Communication of the International Conference on Logic Programming, CEUR Workshop Proceedings, volume 1433(30), 2015c.
[30] David Mart´ınez, Guillem Aleny‘a, Carme Torras, Tony Ribeiro, and Katsumi Inoue. Learning relational dynamics of stochastic domains for planning. In International Conference on Automated Planning and Scheduling, pages 235–243, 2016.
[31] C¸ etin Meri¸cli, Manuela Veloso, and H Levent Akın. Multi-resolution corrective demonstration for efficient task execution and refinement. International Journal of Social Robotics, 4(4):423–435, 2012.
[32] Bogdan Moldovan, Plinio Moreno, Martijn van Otterlo, Jos´e Santos-Victor, and Luc De Raedt. Learning relational affordance models for robots in multi-object manipulation tasks. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 4373–4378, 2012.
[33] Matthew Molineaux and David W Aha. Learning unknown event models. In Proc. of the AAAI Conference on Artificial Intelligence, pages 395–401, 2014.
[34] Kira Mour˜ao. Learning probabilistic planning operators from noisy observations. In Proceedings of the Workshop of the UK Planning and Scheduling Special Interest Group, 2014.
[35] Kira Mour˜ao, Luke S Zettlemoyer, Ronald Petrick, and Mark Steedman. Learning STRIPS operators from noisy and incomplete observations. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, pages 614–623, 2012.
[36] Hanna M Pasula, Luke S Zettlemoyer, and Leslie Pack Kaelbling. Learning symbolic models of stochastic domains. Journal of Artificial Intelligence Research, 29(1):309–352, 2007. · Zbl 1182.68181
[37] Tony Ribeiro and Katsumi Inoue. Learning prime implicant conditions from interpretation transition. In Proceedings of the International Conference on Inductive Logic Programming, LNAI, volume 9046, pages 108–125, 2014. · Zbl 1319.68054
[38] Tony Ribeiro, Morgan Magnin, Katsumi Inoue, and Chiaki Sakama. Learning multi-valued biological models with delayed influence from time-series observations. In Proc. of the International Conference on Machine Learning and Applications, pages 25–31, 2015.
[39] Scott Sanner. Relational dynamic influence diagram language (RDDL): Language description. Unpublished ms. Australian National University, 2010.
[40] Daniel Sykes, Domenico Corapi, Jeff Magee, Jeff Kramer, Alessandra Russo, and Katsumi Inoue. Learning revised models for planning in adaptive systems. In Proceedings of the International Conference on Software Engineering, pages 63–71, 2013.
[41] Matthew E Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10:1633–1685, 2009. 43 · Zbl 1235.68196
[42] Ingo Thon, Niels Landwehr, and Luc De Raedt. Stochastic relational processes: Efficient inference and applications. Machine Learning, 82(2):239–272, 2011. · Zbl 1237.68169
[43] Mauro Vallati, Luk´aˇs Chrpa, Marek Grze´s, Thomas L McCluskey, Mark Roberts, and Scott Sanner. The 2014 international planning competition: Progress and trends. AI Magazine, 36(3):90–98, 2015.
[44] Thomas J Walsh. Efficient learning of relational models for sequential decision making. PhD thesis, Rutgers, The State University of New Jersey, 2010.
[45] Thomas J Walsh, Istv´an Szita, Carlos Diuk, and Michael L Littman. Exploring compact reinforcement-learning representations with linear regression. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, pages 591–598, 2009.
[46] Thomas J Walsh, Kaushik Subramanian, Michael L Littman, and Carlos Diuk. Generalizing apprenticeship learning across hypothesis classes. In Proceedings of the International Conference on Machine Learning, pages 1119–1126, 2010.
[47] Thomas J Walsh, Daniel K Hewlett, and Clayton T Morrison. Blending autonomous exploration and apprenticeship learning. In Advances in Neural Information Processing Systems, pages 2258–2266, 2011.
[48] H˚akan LS Younes and Michael L Littman. PPDDL1. 0: An extension to PDDL for expressing planning domains with probabilistic effects. Technical Report CMU-CS-04-162, 2004.
[49] George Zhu, Dan Lizotte, and Jesse Hoey. Scalable approximate policies for Markov decision process models of hospital elective admissions. Artificial Intelligence in Medicine, 61(1): 21–34, 2014.
[50] Hankz Hankui Zhuo and Subbarao Kambhampati. Action-model acquisition from noisy plan traces. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pages 2444–2450, 2013. · Zbl 1419.68102
[51] Hankz Hankui Zhuo and Qiang Yang. Action-model acquisition for planning via transfer learning. Artificial Intelligence, 212:80–103, 2014. · Zbl 1308.68108
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.