×

Making friends on the fly: cooperating with new teammates. (English) Zbl 1392.68411

Summary: Robots are being deployed in an increasing variety of environments for longer periods of time. As the number of robots grows, they will increasingly need to interact with other robots. Additionally, the number of companies and research laboratories producing these robots is increasing, leading to the situation where these robots may not share a common communication or coordination protocol. While standards for coordination and communication may be created, we expect that robots will need to additionally reason intelligently about their teammates with limited information. This problem motivates the area of ad hoc teamwork in which an agent may potentially cooperate with a variety of teammates in order to achieve a shared goal. This article focuses on a limited version of the ad hoc teamwork problem in which an agent knows the environmental dynamics and has had past experiences with other teammates, though these experiences may not be representative of the current teammates. To tackle this problem, this article introduces a new general-purpose algorithm, PLASTIC, that reuses knowledge learned from previous teammates or provided by experts to quickly adapt to new teammates. This algorithm is instantiated in two forms: 1) PLASTIC-Model – which builds models of previous teammates’ behaviors and plans behaviors online using these models and 2) PLASTIC-Policy – which learns policies for cooperating with previous teammates and selects among these policies online. We evaluate PLASTIC on two benchmark tasks: the pursuit domain and robot soccer in the RoboCup 2D simulation domain. Recognizing that a key requirement of ad hoc teamwork is adaptability to previously unseen agents, the tests use more than 40 previously unknown teams on the first task and 7 previously unknown teams on the second. While PLASTIC assumes that there is some degree of similarity between the current and past teammates’ behaviors, no steps are taken in the experimental setup to make sure this assumption holds. The teammates were created by a variety of independent developers and were not designed to share any similarities. Nonetheless, the results show that PLASTIC was able to identify and exploit similarities between its current and past teammates’ behaviors, allowing it to quickly adapt to new teammates.

MSC:

68T40 Artificial intelligence for robotics
68T05 Learning and adaptive systems in artificial intelligence
68T42 Agent technology and artificial intelligence

Software:

AWESOME; WEKA; TEXPLORE
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Agmon, N.; Barrett, S.; Stone, P., Modeling uncertainty in leading ad hoc teams, (Proceedings of the Thirteenth International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (May 2014))
[2] Agmon, N.; Stone, P., Leading ad hoc agents in joint action settings with multiple teammates, (Proceedings of the Eleventh International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (June 2012))
[3] Akiyama, H., Agent2d base code release, (2010)
[4] Albrecht, S.; Ramamoorthy, S., A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems, (February 2013), The University of Edinburgh United Kingdom, Tech. rep., School of Informatics
[5] Albrecht, S.; Ramamoorthy, S., A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems (extended abstract), (Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems, AAMAS, St. Paul, Minnesota, USA, (May 2013))
[6] Albrecht, S.; Ramamoorthy, S., On convergence and optimality of best-response learning with policy types in multiagent systems, (Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, UAI, Quebec City, Canada, (July 2014))
[7] Albus, J. S., A theory of cerebellar function, Math. Biosci., 10, 1-2, 25-61, (1971)
[8] Albus, J. S., A new approach to manipulator control cerebellar model articulation control (CMAC), Tran. ASME, J. Dyn. Syst. Meas. Control, 97, 9, 220-227, (1975) · Zbl 0314.92007
[9] Almeida, F.; Abreu, P. H.; Lau, N.; Reis, L., An automatic approach to extract goal plans from soccer simulated matches, Soft Comput., 17, 5, 835-848, (2013)
[10] Bard, N.; Johanson, M.; Burch, N.; Bowling, M., Online implicit agent modelling, (Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (2013)), 255-262
[11] Barrett, S.; Stone, P., An analysis framework for ad hoc teamwork tasks, (Proceedings of the Eleventh International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (June 2012))
[12] Barrett, S.; Stone, P., Cooperating with unknown teammates in complex domains: a robot soccer case study of ad hoc teamwork, (Proceedings of the Twenty-Ninth Conference on Artificial Intelligence, AAAI, (January 2015))
[13] Barrett, S.; Stone, P.; Kraus, S., Empirical evaluation of ad hoc teamwork in the pursuit domain, (Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (May 2011))
[14] Barrett, S.; Stone, P.; Kraus, S.; Rosenfeld, A., Teamwork with limited knowledge of teammates, (Proceedings of the Twenty-Seventh Conference on Artificial Intelligence, AAAI, (July 2013))
[15] Benda, M.; Jagannathan, V.; Dodhiawala, R., On optimal cooperation of knowledge sources - an empirical investigation, (July 1986), Boeing Advanced Technology Center, Boeing Computing Services, Tech. Rep. BCS-G2010-28
[16] Biswas, J.; Mendoza, J. P.; Zhu, D.; Choi, B.; Klee, S.; Veloso, M., Opponent-driven planning and execution for pass, attack, and defense in a multi-robot soccer team, (Proceedings of the Thirteenth International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (January 2014))
[17] Blum, A.; Mansour, Y., Learning, regret minimization, and equilibria, (Algorithmic Game Theory, (2007), Cambridge University Press) · Zbl 1143.91311
[18] Bowling, M.; McCracken, P., Coordination and adaptation in impromptu teams, (Proceedings of the Twentieth Conference on Artificial Intelligence, AAAI, (2005)), 53-58
[19] Brafman, R. I.; Tennenholtz, M., On partially controlled multi-agent systems, J. Artif. Intell. Res., 4, 477-507, (1996) · Zbl 0900.68160
[20] Carmel, D.; Markovitch, S., Incorporating opponent models into adversary search, (Proc. of AAAI, (1996)), 120-125
[21] Chakraborty, D.; Stone, P., Convergence, targeted optimality and safety in multiagent learning, (Proceedings of the Twenty-Seventh International Conference on Machine Learning, ICML, (June 2010))
[22] Chakraborty, D.; Stone, P., Cooperating with a Markovian ad hoc teammate, (Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (May 2013))
[23] Conitzer, V.; Sandholm, T., AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents, Mach. Learn., 67, (May 2007)
[24] Dai, W.; Yang, Q.; Xue, G.-R.; Yu, Y., Boosting for transfer learning, (Proceedings of the Twenty-Fourth International Conference on Machine Learning, ICML, (2007)), 193-200
[25] Decker, K. S.; Lesser, V. R., Designing a family of coordination algorithms, (International Conference on Multi-Agent Systems, ICMAS, (June 1995)), 73-80
[26] Deisenroth, M. P.; Neumann, G.; Peters, J., A survey on policy search for robotics, Found. Trends Robot., 2, 1-2, 1-142, (2013)
[27] Doshi, P.; Zeng, Y., Improved approximation of interactive dynamic influence diagrams using discriminative model updates, (Proceedings of the Eighth International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (2009))
[28] Ernst, D.; Geurts, P.; Wehenkel, L., Tree-based batch mode reinforcement learning, J. Mach. Learn. Res., 503-556, (2005) · Zbl 1222.68193
[29] Fang, M.; Guo, Y.; Zhang, X.; Li, X., Multi-source transfer learning based on label shared subspace, Pattern Recognit. Lett., 51, 101-106, (2015)
[30] Ge, L.; Gao, J.; Zhang, A., OMS-TL: a framework of online multiple source transfer learning, (Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM ’13, (2013), ACM New York, NY, USA), 2423-2428
[31] Gelly, S.; Wang, Y., Exploration exploitation in go: UCT for Monte-Carlo go, (Advances in Neural Information Processing Systems, NIPS, vol. 19, (December 2006))
[32] Genter, K.; Agmon, N.; Stone, P., Ad hoc teamwork for leading a flock, (Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (May 2013))
[33] Genter, K.; Stone, P., Influencing a flock via ad hoc teamwork, (Proceedings of the Ninth International Conference on Swarm Intelligence, ANTS, (September 2014))
[34] Gmytrasiewicz, P. J.; Doshi, P., A framework for sequential planning in multi-agent settings, J. Artif. Intell. Res., 24, 1, 49-79, (Jul. 2005) · Zbl 1080.68664
[35] Gmytrasiewicz, P. J.; Durfee, E. H.; Wehe, D. K., A decision-theoretic approach to coordinating multi-agent interactions, (IJCAI, vol. 91, (1991)), 63-68 · Zbl 0747.68069
[36] Grosz, B.; Kraus, S., Collaborative plans for complex group actions, Artif. Intell., 86, 269-368, (1996)
[37] Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H., The WEKA data mining software: an update, ACM SIGKDD Explor. Newsl., 11, 10-18, (November 2009)
[38] Han, J.; Li, M.; Guo, L., Soft control on collective behavior of a group of autonomous agents by a shill agent, J. Syst. Sci. Complex., 19, 54-62, (2006)
[39] Hausknecht, M.; Mupparaju, P.; Subramanian, S.; Kalyanakrishnan, S.; Stone, P., Half field offense: an environment for multiagent learning and ad hoc teamwork, (AAMAS Adaptive Learning Agents (ALA) Workshop, Singapore, (May 2016))
[40] Hester, T.; Stone, P., TEXPLORE: real-time sample-efficient reinforcement learning for robots, Mach. Learn., 90, 3, 385-429, (2013)
[41] Hoang, T. N.; Low, K. H., Interactive POMDP lite: towards practical planning to predict and exploit intentions for interacting with self-interested agents, (The 23th International Joint Conference on Artificial Intelligence, IJCAI, (2013), AAAI Press), 2298-2305
[42] Huang, P.; Wang, G.; Qin, S., Boosting for transfer learning from multiple data sources, Pattern Recognit. Lett., 33, 5, 568-579, (2012)
[43] Huang, Y.-W.; Sasaki, Y.; Harakawa, Y.; Fukushima, E.; Hirose, S., Operation of underwater rescue robot anchor diver III during the 2011 tohoku earthquake and tsunami, (OCEANS 2011, (Sept. 2011)), 1-6
[44] Isaacs, R., Differential games: A mathematical theory with applications to warfare and pursuit, control and optimization, (1965), Dover Publications · Zbl 0125.38001
[45] Ishiwaka, Y.; Sato, T.; Kakazu, Y., An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning, Robot. Auton. Syst., 43, 4, 245-256, (2003)
[46] Jones, E.; Browning, B.; Dias, M. B.; Argall, B.; Veloso, M. M.; Stentz, A. T., Dynamically formed heterogeneous robot teams performing tightly-coordinated tasks, (Proceedings of the IEEE International Conference on Robotics and Automation, ICRA, (May 2006)), 570-575
[47] Jung, T.; Polani, D.; Stone, P., Empowerment for continuous agent-environment systems, (2010), The University of Texas at Austin Computer Science Department, Tech. Rep. AI-10-03
[48] Kalyanakrishnan, S.; Liu, Y.; Stone, P., Half field offense in RoboCup soccer: a multiagent reinforcement learning case study, (RoboCup-2006: Robot Soccer World Cup X, Lecture Notes in Artificial Intelligence, vol. 4434, (2007), Springer-Verlag Berlin), 72-85
[49] Kalyanakrishnan, S.; Stone, P., Characterizing reinforcement learning methods through parameterized learning problems, Mach. Learn., 84, 1-2, 205-247, (July 2011)
[50] Kamishima, T.; Hamasaki, M.; Akaho, S., Trbagg: a simple transfer learning method and its application to personalization in collaborative tagging, (Ninth IEEE International Conference on Data Mining, ICDM, (Dec. 2009)), 219-228
[51] Kocsis, L.; Szepesvari, C., Bandit based Monte-Carlo planning, (Proceedings of the Seventeenth European Conference on Machine Learning, ECML, (2006))
[52] Korzhyk, D.; Yin, Z.; Kiekintveld, C.; Conitzer, V.; Tambe, M., Stackelberg vs. Nash in security games: an extended investigation of interchangeability, equivalence, and uniqueness, J. Artif. Intell. Res., 41, 2, 297-327, (May 2011) · Zbl 1219.91032
[53] Lauer, M.; Riedmiller, M., An algorithm for distributed reinforcement learning in cooperative multi-agent systems, (Proceedings of the Seventeenth International Conference on Machine Learning, ICML, (2000), Morgan Kaufmann), 535-542
[54] Liemhetcharat, S.; Veloso, M., Weighted synergy graphs for effective team formation with heterogeneous ad hoc agents, Artif. Intell., 208, 41-65, (2014) · Zbl 1334.68230
[55] Murphy, R.; Dreger, K.; Newsome, S.; Rodocker, J.; Steimle, E.; Kimura, T.; Makabe, K.; Matsuno, F.; Tadokoro, S.; Kon, K., Use of remotely operated marine vehicles at minamisanriku and rikuzentakata Japan for disaster recovery, (2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, SSRR, (November 2011)), 19-25
[56] Nagatani, K.; Kiribayashi, S.; Okada, Y.; Tadokoro, S.; Nishimura, T.; Yoshida, T.; Koyanagi, E.; Hada, Y., Redesign of rescue mobile robot quince, (2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, SSRR, (November 2011)), 13-18
[57] Pardoe, D.; Stone, P., Boosting for regression transfer, (Proceedings of the Twenty-Seventh International Conference on Machine Learning, ICML, (June 2010))
[58] Richardson, D., Robots to the rescue?, Eng. Technol., 6, 4, 52-54, (May 2011)
[59] Silver, D.; Sutton, R. S.; Müller, M., Sample-based learning and search with permanent and transient memories, (Proceedings of the Twenty-Fifth International Conference on Machine Learning, ICML, (2008))
[60] Silver, D.; Veness, J., Monte-Carlo planning in large pomdps, (Advances in Neural Information Processing Systems, NIPS, vol. 23, (2010))
[61] Sonu, E.; Doshi, P., Generalized and bounded policy iteration for finitely-nested interactive POMDPs: scaling up, (Proceedings of the Eleventh International Conference on Autonomous Agents and Multiagent Systems, AAMAS, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, (2012)), 1039-1048
[62] Stone, P., Layered learning in multiagent systems: A winning approach to robotic soccer, (2000), MIT Press
[63] Stone, P.; Kaminka, G. A.; Kraus, S.; Rosenschein, J. S., Ad hoc autonomous agent teams: collaboration without pre-coordination, (Proceedings of the Twenty-Fourth Conference on Artificial Intelligence, AAAI, (July 2010))
[64] Stone, P.; Kaminka, G. A.; Rosenschein, J. S., Leading a best-response teammate in an ad hoc team, (AAMAS Workshop on Agent Mediated Electronic Commerce, AMEC, (November 2010))
[65] Stone, P.; Kraus, S., To teach or not to teach? decision making under uncertainty in ad hoc teams, (Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, AAMAS, (May 2010))
[66] Stone, P.; Veloso, M., Multiagent systems: a survey from a machine learning perspective, Auton. Robots, 8, 3, 345-383, (July 2000)
[67] Sutton, R. S.; Barto, A. G., Reinforcement learning: an introduction, (1998), MIT Press Cambridge, MA, USA
[68] Tambe, M., Towards flexible teamwork, J. Artif. Intell. Res., 7, 81-124, (1997)
[69] Taylor, M. E.; Stone, P., Transfer learning for reinforcement learning domains: a survey, J. Mach. Learn. Res., 10, 1, 1633-1685, (2009) · Zbl 1235.68196
[70] Undeger, C.; Polat, F., Multi-agent real-time pursuit, Auton. Agents Multi-Agent Syst., 21, 1, 69-107, (2010)
[71] Watkins, C. J.C. H., Learning from delayed rewards, (May 1989), King’s College Cambridge, UK, Ph.D. thesis
[72] Wu, F.; Zilberstein, S.; Chen, X., Online planning for ad hoc autonomous agent teams, (The 22th International Joint Conference on Artificial Intelligence, IJCAI, (2011))
[73] Xuan, P.; Lesser, V.; Zilberstein, S., Communication decisions in multi-agent cooperation: model and experiments, (Proceedings of the Fifth International Conference on Autonomous Agents, AGENTS, (2001))
[74] Yao, Y.; Doretto, G., Boosting for transfer learning with multiple sources, (Proceedings of the Conference on Computer Vision and Pattern Recognition, CVPR, (June 2010))
[75] Zeng, Y.; Chen, Y.; Doshi, P., Approximating model equivalence in interactive dynamic influence diagrams using top k policy paths, (IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 2, (2011)), 208-211
[76] Zeng, Y.; Doshi, P., Exploiting model equivalences for solving interactive dynamic influence diagrams, J. Artif. Intell. Res., 43, 1, 211-255, (Jan. 2012) · Zbl 1237.68199
[77] Zhuang, F.; Cheng, X.; Pan, S.; Yu, W.; He, Q.; Shi, Z., Transfer learning with multiple sources via consensus regularized autoencoders, (Calders, T.; Esposito, F.; Hüllermeier, E.; Meo, R., Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, vol. 8726, (2014), Springer Berlin, Heidelberg), 417-431
[78] Zinkevich, M.; Johanson, M.; Bowling, M.; Piccione, C., Regret minimization in games with incomplete information, (Advances in Neural Information Processing Systems, NIPS, vol. 20, (2008)), 905-912
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.