Deliberative acting, planning and learning with hierarchical operational models. (English) Zbl 07418681

Summary: In AI research, synthesizing a plan of action has typically used descriptive models of the actions that abstractly specify what might happen as a result of an action, and are tailored for efficiently computing state transitions. However, executing the planned actions has needed operational models, in which rich computational control structures and closed-loop online decision-making are used to specify how to perform an action in a nondeterministic execution context, react to events and adapt to an unfolding situation. Deliberative actors, which integrate acting and planning, have typically needed to use both of these models together – which causes problems when attempting to develop the different models, verify their consistency, and smoothly interleave acting and planning. As an alternative, we define and implement an integrated acting and planning system in which both planning and acting use the same operational models. These rely on hierarchical task-oriented refinement methods offering rich control structures. The acting component, called Reactive Acting Engine (RAE), is inspired by the well-known PRS system. At each decision step, RAE can get advice from a planner for a near-optimal choice with respect to an utility function. The anytime planner uses a UCT-like Monte Carlo Tree Search procedure, called UPOM, whose rollouts are simulations of the actor’s operational models. We also present learning strategies for use with RAE and UPOM that acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. We demonstrate the asymptotic convergence of UPOM towards optimal methods in static domains, and show experimentally that UPOM and the learning strategies significantly improve the acting efficiency and robustness.


68Txx Artificial intelligence
Full Text: DOI arXiv


[1] Andre, D.; Russell, S. J., State abstraction for programmable reinforcement learning agents, (AAAI (2002))
[2] Argall, B. D.; Chernova, S.; veloso, M. M.; Browning, B., A survey of robot learning from demonstration, Robot. Auton. Syst., 57, 469-483 (2009)
[3] Bäckström, C.; Nebel, B., Complexity results for SAS+ planning, Comput. Intell., 11, 625-655 (1995)
[4] Barry, J. L.; Kaelbling, L. P.; Lozano-Pérez, T., A hierarchical approach to manipulation with diverse actions, (ICRA (2013)), 1799-1806
[5] Beetz, M.; McDermott, D., Improving robot plans during their execution, (AIPS (1994))
[6] Boeing, A.; Bräunl, T., Evaluation of real-time physics simulation systems, (Proceedings of the 5th International Conference on Computer Graphics and Interactive Techniques in Australia and Southeast Asia (2007)), 281-288
[7] Bohren, J.; Rusu, R. B.; Jones, E. G.; Marder-Eppstein, E.; Pantofaru, C.; Wise, M.; Mösenlechner, L.; Meeussen, W.; Holzer, S., Towards autonomous robotic butlers: lessons learned with the PR2, (ICRA (2011)), 5568-5575
[8] Bridson, R., Fluid Simulation for Computer Graphics (2015), CRC Press
[9] Cimatti, A.; Pistore, M.; Roveri, M.; Traverso, P., Weak, strong, and strong cyclic planning via symbolic model checking, Artif. Intell., 147, 35-84 (2003) · Zbl 1082.68800
[10] Claßen, J.; Röger, G.; Lakemeyer, G.; Nebel, B., Platas—integrating planning and the action language Golog, Künstl. Intell., 26, 61-67 (2012)
[11] Colledanchise, M., Behavior Trees in Robotics (2017), KTH: KTH Stockholm, Sweden, Ph.D. thesis
[12] Colledanchise, M.; Ögren, P., How behavior trees modularize hybrid control systems and generalize sequential behavior compositions, the subsumption architecture, and decision trees, IEEE Trans. Robot., 33, 372-389 (2017)
[13] Colledanchise, M.; Parasuraman, R.; Ögren, P., Learning of behavior trees for autonomous agents, IEEE Trans. Games, 11, 183-189 (2018)
[14] Conrad, P.; Shah, J.; Williams, B. C., Flexible execution of plans with choice, (ICAPS (2009))
[15] De Silva, L.; Meneguzzi, F.; Logan, B., BDI agent architectures: a survey, (IJCAI (2020))
[16] De Silva, L.; Meneguzzi, F. R.; Logan, B., An operational semantics for a fragment of prs, (IJCAI (2018))
[17] Deisenroth, M. P.; Neumann, G.; Peters, J., A survey on policy search for robotics, Found. Trends Robot., 2, 1-142 (2013)
[18] Despouys, O.; Ingrand, F., Propice-plan: toward a unified framework for planning and execution, (ECP (1999))
[19] Doherty, P.; Kvarnström, J.; Heintz, F., A temporal logic-based planning and execution monitoring framework for unmanned aircraft systems, Auton. Agents Multi-Agent Syst., 19, 332-377 (2009)
[20] Duong, T. V.; Phung, D. Q.; Bui, H. H.; Venkatesh, S., Efficient duration and hierarchical modeling for human activity recognition, Artif. Intell., 173, 830-856 (2009)
[21] Effinger, R.; Williams, B.; Hofmann, A., Dynamic execution of temporally and spatially flexible reactive programs, (AAAI Wksp. on Bridging the Gap Between Task and Motion Planning (2010)), 1-8
[22] Erol, K.; Nau, D. S.; Subrahmanian, V. S., Complexity, decidability and undecidability results for domain-independent planning, Artif. Intell., 76, 75-88 (1995) · Zbl 1013.68548
[23] Estlin, T.; Gaines, D.; Chouinard, C.; Castano, R.; Bornstein, B.; Judd, M.; Nesnas, I.; Anderson, R., Increased mars rover autonomy using ai planning, scheduling and execution, (ICRA (2007), IEEE), 4911-4918
[24] Faure, F.; Duriez, C.; Delingette, H.; Allard, J.; Gilles, B.; Marchesseau, S.; Talbot, H.; Courtecuisse, H.; Bousquet, G.; Peterlik, I., Sofa: a multi-model framework for interactive physical simulation, (Soft Tissue Biomechanical Modeling for Computer Assisted Surgery (2012), Springer), 283-321
[25] Feldman, Z.; Domshlak, C., Monte-carlo planning: Theoretically fast convergence meets practical efficiency, (UAI (2013))
[26] Feldman, Z.; Domshlak, C., Monte Carlo tree search: to MC or to DP?, (ECAI (2014)), 321-326
[27] Ferrein, A.; Lakemeyer, G., Logic-based robot control in highly dynamic domains, Robot. Auton. Syst., 56, 980-991 (2008)
[28] Fikes, R. E.; Nilsson, N. J., STRIPS: a new approach to the application of theorem proving to problem solving, Artif. Intell., 2, 189-208 (1971) · Zbl 0234.68036
[29] Firby, R. J., An investigation into reactive planning in complex domains, (AAAI (1987)), 202-206
[30] Fox, M.; Long, D., PDDL2.1: an extension to PDDL for expressing temporal planning domains, J. Artif. Intell. Res., 20, 61-124 (2003) · Zbl 1036.68093
[31] Fox, M.; Long, D., Modelling mixed discrete-continuous domains for planning, J. Artif. Intell. Res., 27, 235-297 (2006) · Zbl 1182.68238
[32] Garnelo, M.; Arulkumaran, K.; Shanahan, M., Towards deep symbolic reinforcement learning (2016)
[33] Garrett, C. R.; Lozano-Perez, T.; Kaelbling, L. P., FFRob: leveraging symbolic planning for efficient task and motion planning, Int. J. Robot. Res., 37, 104-136 (2018)
[34] Garrett, C. R.; Lozano-Pérez, T.; Kaelbling, L. P., PDDLStream: integrating symbolic planners and blackbox samplers via optimistic adaptive planning, (ICAPS (2020)), 440-448
[35] Geffner, H.; Bonet, B., A Concise Introduction to Models and Methods for Automated Planning (2013), Morgan & Claypool · Zbl 1270.68012
[36] Ghallab, M.; Nau, D.; Traverso, P., The actor’s view of automated planning and acting: a position paper, Artif. Intell., 208, 1-17 (2014)
[37] Ghallab, M.; Nau, D. S.; Traverso, P., Automated Planning and Acting (2016), Cambridge University Press
[38] Goldman, R. P., A semantics for HTN methods, (ICAPS (2009))
[39] Goldman, R. P.; Bryce, D.; Pelican, M. J.; Musliner, D. J.; Bae, K., A hybrid architecture for correct-by-construction hybrid planning and control, (NASA Formal Methods Symposium (2016), Springer), 388-394
[40] Gulwani, S.; Polozov, O.; Singh, R., Program synthesis, Found. Trends Progr. Lang., 4, 1-119 (2017)
[41] Hähnel, D.; Burgard, W.; Lakemeyer, G., GOLEX - bridging the gap between logic (GOLOG) and a real robot, (KI (1998), Springer), 165-176
[42] Haslum, P.; Lipovetzky, N.; Magazzeni, D.; Muise, C., An Introduction to the Planning Domain Definition Language (2019), Morgan & Claypool · Zbl 1434.68004
[43] Hauskrecht, M.; Meuleau, N.; Kaelbling, L. P.; Dean, T. L.; Boutilier, C., Hierarchical solution of Markov decision processes using macro-actions (2013)
[44] Henaff, M.; Canziani, A.; LeCun, Y., Model-predictive policy learning with uncertainty regularization for driving in dense traffic (2019)
[45] Hester, T.; Stone, P., TEXPLORE: real-time sample-efficient reinforcement learning for robots, (AAAI Spring Symposium (2012)), 1-6
[46] Hitzler, P.; Wendt, M., A uniform approach to logic programming semantics, Theory Pract. Log. Program., 5, 93-121 (2005) · Zbl 1093.68019
[47] Hogg, C.; Kuter, U.; Muñoz-Avila, H., Learning hierarchical task networks for nondeterministic planning domains, (IJCAI (2009)), 1708-1714
[48] Hogg, C.; Kuter, U.; Muñoz-Avila, H., Learning methods to generate good plans: integrating HTN learning and reinforcement learning, (AAAI (2010))
[49] Hogg, C.; Muñoz-Avila, H.; Kuter, U., HTN-MAKER: learning HTNs with minimal additional knowledge engineering required, (AAAI (2008)), 950-956
[50] Hwang, I.; Kim, S.; Kim, Y.; Seah, C. E., A survey of fault detection, isolation, and reconfiguration methods, IEEE Trans. Control Syst. Technol., 18, 636-653 (2010)
[51] Ingham, M. D.; Ragno, R. J.; Williams, B. C., A reactive model-based programming language for robotic space explorers, (i-SAIRAS (2001))
[52] Ingrand, F.; Chatilla, R.; Alami, R.; Robert, F., PRS: a high level supervision and control language for autonomous mobile robots, (ICRA (1996)), 43-49
[53] Ingrand, F.; Ghallab, M., Deliberation for autonomous robots: a survey, Artif. Intell., 247, 10-44 (2017)
[54] Jahangirian, M.; Eldabi, T.; Naseer, A.; Stergioulas, L. K.; Young, T., Simulation in manufacturing and business: a review, Eur. J. Oper. Res., 203, 1-13 (2010)
[55] James, S.; Konidaris, G.; Rosman, B., An analysis of Monte Carlo tree search, (AAAI (2017)), 3576-3582
[56] Jevtic, A.; Colomé, A.; Alenyà, G.; Torras, C., Robot motion adaptation through user intervention and reinforcement learning, Pattern Recognit. Lett., 105, 67-75 (2018)
[57] Jonsson, P.; Bäckström, C., State-variable planning under structural restrictions: algorithms and complexity, Artif. Intell., 100, 125-176 (1998) · Zbl 0906.68140
[58] Kaelbling, L. P.; Littman, M. L.; Moore, A. W., Reinforcement learning: a survey, J. Artif. Intell. Res., 4, 237-285 (1996)
[59] Kaelbling, L. P.; Lozano-Perez, T., Hierarchical task and motion planning in the now, (ICRA (2011)), 1470-1477
[60] Kaelbling, L. P.; Lozano-Perez, T., Integrated task and motion planning in belief space, Int. J. Robot. Res., 32, 1194-1227 (2013)
[61] Kambhampati, S., Are we comparing Dana and Fahiem or SHOP and TLPlan? A critique of the knowledge-based planning track at ICP (2003)
[62] Katt, S.; Oliehoek, F. A.; Amato, C., Learning in POMDPs with Monte Carlo tree search, (ICML (2017))
[63] Keller, T.; Eyerich, P., PROST: probabilistic planning based on UCT, (ICAPS (2012)), 119-127
[64] Kober, J., Learning Motor Skills: from Algorithms to Robot Experiments (2012), Darmstadt University, Ph.D. thesis
[65] Kober, J.; Bagnell, J. A.; Peters, J., Reinforcement learning in robotics: a survey, Int. J. Robot. Res. (2013)
[66] Kocsis, L.; Szepesvári, C., Bandit based Monte Carlo planning, (ECML (2006)), 282-293
[67] Kortenkamp, D.; Simmons, R., Robotic systems architectures and programming, (Siciliano, B.; Khatib, O., Springer Handbook of Robotics (2008), Springer), 187-206
[68] Lallement, R.; De Silva, L.; Alami, R., HATP: an HTN planner for robotics (2014)
[69] Lang, T.; Toussaint, M.; Kersting, K., Exploration in relational domains for model-based reinforcement learning, J. Mach. Learn. Res., 13, 3725-3768 (2012) · Zbl 1433.68360
[70] León, B.; Ulbrich, S.; Diankov, R.; Puche, G.; Przybylski, M.; Morales, A.; Asfour, T.; Moisio, S.; Bohg, J.; Kuffner, J., Opengrasp: a toolkit for robot grasping simulation, (International Conference on Simulation, Modeling, and Programming for Autonomous Robots (2010), Springer), 109-120
[71] Leonetti, M.; Iocchi, L.; Stone, P., A synthesis of automated planning and reinforcement learning for efficient, robust decision-making, Artif. Intell., 241, 103-130 (2016) · Zbl 1392.68387
[72] Lesire, C.; Pommereau, F., ASPiC: an acting system based on skill Petri net composition, (2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018), IEEE), 6952-6958
[73] Levine, S. J.; Williams, B. C., Concurrent plan recognition and execution for human-robot teams, (ICAPS (2014))
[74] Marthi, B. M.; Russell, S. J.; Latham, D.; Guestrin, C., Concurrent hierarchical reinforcement learning, (AAAI (2005)), 1652
[75] Martínez, D. M.; Alenyà, G.; Ribeiro, T.; Inoue, K.; Torras, C., Relational reinforcement learning for planning with exogenous effects, J. Mach. Learn. Res., 18, 78:1-78:44 (2017) · Zbl 1434.68432
[76] Martínez, D. M.; Alenyà, G.; Torras, C., Relational reinforcement learning with guided demonstrations, Artif. Intell., 247, 295-312 (2017) · Zbl 1420.68170
[77] Mausam, A. K., Planning with Markov Decision Processes: An ai Perspective (2012), Morgan & Claypool Publishers · Zbl 1270.68014
[78] McDermott, D. M., The 1998 AI planning systems competition, AI Mag., 21, 35 (2000)
[79] Meneguzzi, F.; De Silva, L., Planning in BDI agents: a survey of the integration of planning algorithms and agent reasoning, Knowl. Eng. Rev., 30, 1-44 (2015)
[80] Michel, O., Cyberbotics Ltd. Webots™: professional mobile robot simulation, Int. J. Adv. Robot. Syst., 1, 5 (2004)
[81] Morisset, B.; Ghallab, M., Learning how to combine sensory-motor functions into a robust behavior, Artif. Intell., 172, 392-412 (2008)
[82] Mourtzis, D.; Doukas, M.; Bernidaki, D., Simulation in manufacturing: review and challenges, Proc. CIRP, 25, 213-229 (2014)
[83] Muscettola, N.; Nayak, P. P.; Pell, B.; Williams, B. C., Remote agent: to boldly go where no AI system has gone before, Artif. Intell., 103, 5-47 (1998) · Zbl 0909.68167
[84] Musliner, D. J.; Pelican, M. J.; Goldman, R. P.; Krebsbach, K. D.; Durfee, E. H., The evolution of CIRCA, a theory-based ai architecture with real-time performance guarantees, (AAAI Spring Symposium: Emotion, Personality, and Social Behavior (2008))
[85] Myers, K. L., CPEF: a continuous planning and execution framework, AI Mag., 20, 63-69 (1999)
[86] Nau, D. S.; Au, T. C.; Ilghami, O.; Kuter, U.; Murdock, J. W.; Wu, D.; Yaman, F., SHOP2: an HTN planning system, J. Artif. Intell. Res., 20, 379-404 (2003) · Zbl 1058.68106
[87] Nau, D. S.; Cao, Y.; Lotem, A.; Muñoz-Avila, H., SHOP: simple hierarchical ordered planner, (IJCAI (1999)), 968-973
[88] Newborn, M., Deep Blue: An Artificial Intelligence Milestone (2013), Springer Science & Business Media
[89] Parr, R.; Russell, S. J., Reinforcement learning with hierarchies of machines, (NIPS (1997))
[90] Parr, R.; Russell, S. J., Reinforcement learning with hierarchies of machines, (Advances in Neural Information Processing Systems (1998)), 1043-1049
[91] Pasula, H. M.; Zettlemoyer, L. S.; Kaelbling, L. P., Learning symbolic models of stochastic domains, J. Artif. Intell. Res., 29, 309-352 (2007) · Zbl 1182.68181
[92] Patra, S., RAE and UPOM (2020)
[93] Patra, S.; Ghallab, M.; Nau, D.; Traverso, P., APE: an acting and planning engine, Adv. Cogn. Syst. (2018)
[94] Patra, S.; Ghallab, M.; Nau, D.; Traverso, P., Acting and planning using operational models, (AAAI (2019)), 7691-7698
[95] Patra, S.; Mason, J.; Kumar, A.; Ghallab, M.; Traverso, P.; Nau, D., Integrating acting, planning, and learning in hierarchical operational models, (ICAPS (2020))
[96] Patra, S.; Velasquez, A.; Kang, M.; Nau, D., Using online planning and acting to recover from cyberattacks on software-defined networks, (IAAI (2021))
[97] Peters, J.; Kober, J.; Nguyen-Tuong, D., Policy learning-a unified perspective with applications in robotics, Recent Adv. Reinforc. Learn., 220-228 (2008)
[98] Pettersson, O., Execution monitoring in robotics: a survey, Robot. Auton. Syst., 53, 73-88 (2005)
[99] Ramchandani, N., Virtual coaching to enhace diabetes care, Diabetes Technol. Ther., 21, S2-48-S2-51 (2019)
[100] Ross, S.; Pineau, J.; Chaib-draa, B.; Kreitmann, P., A bayesian approach for learning and planning in partially observable Markov decision processes, J. Mach. Learn. Res., 12, 1729-1770 (2011) · Zbl 1280.68193
[101] Ryan, M. R.K., Using abstract models of behaviours to automatically generate reinforcement learning hierarchies, (ICML (2002))
[102] Sanner, S., Relational Dynamic Influence Diagram Language (RDDL): Language Description (2010), Technical Report. NICTA
[103] Santana, P. H.R. Q.A.; Williams, B. C., Chance-constrained consistency for probabilistic temporal plan networks, (ICAPS (2014))
[104] Sardina, S.; de Silva, L.; Padgham, L., Hierarchical planning in BDI agent programming languages: a formal approach, (Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (2006)), 1001-1008
[105] Shah, S.; Dey, D.; Lovett, C.; Kapoor, A., Airsim: high-fidelity visual and physical simulation for autonomous vehicles, (Field and Service Robotics (2018), Springer), 621-635
[106] de Silva, L.; Meneguzzi, F.; Logan, B., An operational semantics for a fragment of PRS, (IJCAI (2018))
[107] Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; Lillicrap, T.; Simonyan, K.; Hassabis, D., A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, Science, 362, 1140-1144 (2018) · Zbl 1433.68320
[108] Simmons, R., Concurrent planning and execution for autonomous robots, IEEE Control Syst., 12, 46-50 (1992)
[109] Simmons, R., Structured control for autonomous robots, IEEE Trans. Robot. Autom., 10, 34-43 (1994)
[110] Simmons, R.; Apfelbaum, D., A task description language for robot control, (IROS (1998)), 1931-1937
[111] Simpkins, C.; Bhat, S.; Isbell, C.; Mateas, M., Towards adaptive programming: integrating reinforcement learning into a programming language, (ACM SIGPLAN Conf. on Object-Oriented Progr. Syst., Lang., and Applications (OOPSLA) (2008), ACM), 603-614
[112] Sutton, R. S.; Barto, A. G., Reinforcement Learning - an Introduction. Adaptive Computation and Machine Learning (1998), MIT Press
[113] Teichteil-Königsbuch, F.; Infantes, G.; Kuter, U. R. FF, A robust, FF-based MDP planning algorithm for generating policies with low probability of failure, (ICAPS (2008))
[114] Thiébaux, S.; Hoffmann, J.; Nebel, B., In defense of PDDL axioms, Artif. Intell., 168, 38-69 (2005) · Zbl 1132.68714
[115] Veloso, M. M.; Biswas, J.; Coltin, B.; Rosenthal, S., Cobots: robust symbiotic autonomous mobile service robots, (IJCAI (2015)), 4423
[116] Verma, V.; Estlin, T.; Jónsson, A. K.; Pasareanu, C.; Simmons, R.; Tso, K., Plan execution interchange language (PLEXIL) for executable plans and command sequences, (i-SAIRAS (2005))
[117] Walkinshaw, N.; Taylor, R.; Derrick, J., Inferring extended finite state machine models from software executions, Empir. Softw. Eng., 21, 811-853 (2016)
[118] Wang, F. Y.; Kyriakopoulos, K. J.; Tsolkas, A.; Saridis, G. N., A Petri-net coordination model for an intelligent mobile robot, IEEE Trans. Syst. Man Cybern., 21, 777-789 (1991)
[119] Williams, B. C.; Abramson, M., Executing reactive, model-based programs through graph-based temporal planning, (IJCAI (2001))
[120] Wolfe, J.; Marthi, B., Combined task and motion planning for mobile manipulation, (International Conference on Automated Planning and Scheduling (2010)), 254-257
[121] Yang, F.; Lyu, D.; Liu, B.; Gustafson, S. P.EO. RL, Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making, (IJCAI (2018))
[122] Yoon, S. W.; Fern, A.; Givan, R., FF-Replan: a baseline for probabilistic planning, (ICAPS (2007)), 352-359
[123] Yoon, S. W.; Fern, A.; Givan, R.; Kambhampati, S., Probabilistic planning via determinization in hindsight, (AAAI (2008)), 1010-1016
[124] Younes, H.; Littman, M., PPDDL: the probabilistic planning domain definition language (2004), Technical Report. CMU
[125] Zhang, Q.; Yao, J.; Yin, Q.; Zha, Y., Learning behavior trees for autonomous agents with hybrid constraints evolution, Appl. Sci., 8, 1077 (2018)
[126] Zhuo, H. H.; Hu, D. H.; Hogg, C.; Yang, Q.; Muñoz-Avila, H., Learning HTN method preconditions and action models from partial observations, (IJCAI (2009)), 1804-1810
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.