×

zbMATH — the first resource for mathematics

Grounded language interpretation of robotic commands through structured learning. (English) Zbl 07153697
Summary: The presence of robots in everyday life is increasing day by day at a growing pace. Industrial and working environments, health-care assistance in public or domestic areas can benefit from robots’ services to accomplish manifold tasks that are difficult and annoying for humans. In such scenarios, Natural Language interactions, enabling collaboration and robot control, are meant to be situated, in the sense that both the user and the robot access and make reference to the environment. Contextual knowledge may thus play a key role in solving inherent ambiguities of grounded language as, for example, the prepositional phrase attachment.
In this work, we present a linguistic pipeline for semantic processing of robotic commands, that combines discriminative structured learning, distributional semantics and contextual evidence extracted from the working environment. The final goal is to make the interpretation process of linguistic exchanges depending on physical, cognitive and language-dependent aspects. We present, formalize and discuss an adaptive Spoken Language Understanding chain for robotic commands, that explicitly depends on the operational context during both the learning and processing stages. The resulting framework allows to model heterogeneous information concerning the environment (e.g., positional information about the objects and their properties) and to inject it in the learning process. Empirical results demonstrate a significant contribution of such additional dimensions, achieving up to a 25% of relative error reduction with respect to a pipeline that only exploits linguistic evidence.
MSC:
68T Artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aikawa, T.; Quirk, C.; Schwartz, L., Learning Prepositional Attachment From Sentence Aligned Bilingual Corpora (2003), Association for Machine Translation in the Americas
[2] Church, K.; Patil, R., Coping with syntactic ambiguity or how to put the block in the box on the table, Comput. Linguist., 8, 3-4, 139-149 (1982)
[3] Wasow, T.; Perfors, A.; Beaver, D., The puzzle of ambiguity, (Morphology and the Web of Grammar: Essays in Memory of Steven G. Lapointe (2005)), 265-282
[4] Bastianelli, E.; Croce, D.; Vanzo, A.; Basili, R.; Nardi, D., A discriminative approach to grounded spoken language understanding in interactive robotics, (Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI). Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, USA (2016))
[5] Nüchter, A.; Hertzberg, J., Towards semantic maps for mobile robots, Robot. Auton. Syst., 56, 11, 915-926 (2008)
[6] (Bansal, M.; Matuszek, C.; Andreas, J.; Artzi, Y.; Bisk, Y., Proceedings of the First Workshop on Language Grounding for Robotics (2017), Association for Computational Linguistics: Association for Computational Linguistics Vancouver, Canada)
[7] (Salvi, G.; Dupont, S., Proceedings GLU 2017 International Workshop on Grounding Language Understanding. Proceedings GLU 2017 International Workshop on Grounding Language Understanding, Stockholm, Sweden (2017))
[8] Tellex, S.; Kollar, T.; Dickerson, S.; Walter, M.; Banerjee, A.; Teller, S.; Roy, N., Approaching the symbol grounding problem with probabilistic graphical models, AI Mag., 32, 4, 64-76 (2011)
[9] Lindes, P.; Mininger, A.; Kirk, J. R.; Laird, J. E., Grounding language for interactive task learning, (Proceedings of the First Workshop on Language Grounding for Robotics (2017)), 1-9
[10] Kaplan, F., Talking AIBO: first experimentation of verbal interactions with an autonomous four-legged robot, (Proceedings of the CELE-Twente Workshop on Interacting Agents (2000))
[11] Thomason, J.; Zhang, S.; Mooney, R.; Stone, P., Learning to interpret natural language commands through human-robot dialog, (Proceedings of the 2015 International Joint Conference on Artificial Intelligence (IJCAI). Proceedings of the 2015 International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina (2015)), 1923-1929
[12] Sahlgren, M., The Word-Space Model (2006), Stockholm University, Ph.D. thesis
[13] Mikolov, T.; Chen, K.; Corrado, G.; Dean, J., Efficient estimation of word representations in vector space, CoRR
[14] Bastianelli, E.; Croce, D.; Basili, R.; Nardi, D., Using semantic models for robust natural language human robot interaction, (AI*IA 2015, Advances in Artificial Intelligence (2015), Springer International Publishing), 343-356
[15] Yang, S.; Gao, Q.; Liu, C.; Xiong, C.; Zhu, S.-C.; Chai, J. Y., Grounded semantic role labeling, (Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016), Association for Computational Linguistics), 149-159
[16] Gao, Q.; Doering, M.; Yang, S.; Chai, J. Y., Physical causality of action verbs in grounded language understanding, (ACL (1), The Association for Computer Linguistics (2016))
[17] Gella, S.; Lapata, M.; Keller, F., Unsupervised visual sense disambiguation for verbs using multimodal embeddings, (Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016), Association for Computational Linguistics), 182-192
[18] Berzak, Y.; Barbu, A.; Harari, D.; Katz, B.; Ullman, S., Do you see what I mean? Visual resolution of linguistic ambiguities, CoRR
[19] M. Alomari, P. Duckworth, M. Hawasly, D. Hogg, A. Cohn, Natural language grounding and grammar induction for robotic manipulation commands, August 2017.
[20] Gemignani, G.; Capobianco, R.; Bastianelli, E.; Bloisi, D.; Iocchi, L.; Nardi, D., Living with robots: interactive environmental knowledge acquisition, Robot. Auton. Syst., 78, 1-16 (2016)
[21] Christie, G.; Laddha, A.; Agrawal, A.; Antol, S.; Goyal, Y.; Kochersberger, K.; Batra, D., Resolving vision and language ambiguities together: joint segmentation & prepositional attachment resolution in captioned scenes, Comput. Vis. Image Underst., 163, 101-112 (2017), language in Vision
[22] Kong, C.; Lin, D.; Bansal, M.; Urtasun, R.; Fidler, S., What are you talking about? Text-to-image coreference, (Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14 (2014), IEEE Computer Society: IEEE Computer Society Washington, DC, USA), 3558-3565
[23] Matuszek, C.; FitzGerald, N.; Zettlemoyer, L. S.; Bo, L.; Fox, D., A joint model of language and perception for grounded attribute learning, (ICML, icml.cc/Omnipress (2012))
[24] Krishnamurthy, J.; Kollar, T., Jointly learning to parse and perceive: connecting natural language to the physical world, TACL, 1, 193-206 (2013)
[25] Yu, Y.; Eshghi, A.; Lemon, O., Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings, (Proceedings of the First Workshop on Language Grounding for Robotics (2017), Association for Computational Linguistics: Association for Computational Linguistics Vancouver, Canada), 10-19
[26] Diosi, A.; Taylor, G. R.; Kleeman, L., Interactive SLAM using laser and advanced sonar, (Proceedings of the 2005 IEEE International Conference on Robotics and Automation. Proceedings of the 2005 IEEE International Conference on Robotics and Automation, ICRA 2005, Barcelona, Spain, 18-22 April, 2005 (2005)), 1103-1108
[27] Chen, D. L.; Mooney, R. J., Learning to interpret natural language navigation instructions from observations, (Proceedings of the 25th AAAI Conference on AI (2011)), 859-865
[28] Matuszek, C.; Herbst, E.; Zettlemoyer, L. S.; Fox, D., Learning to parse natural language commands to a robot control system, (Desai, J. P.; Dudek, G.; Khatib, O.; Kumar, V., ISER. ISER, Springer Tracts in Advanced Robotics, vol. 88 (2012), Springer), 403-415
[29] Bastianelli, E.; Castellucci, G.; Croce, D.; Basili, R.; Nardi, D., Effective and robust natural language understanding for human-robot interaction, (Proceedings of ECAI 2014 (2014), IOS Press)
[30] Bastianelli, E.; Castellucci, G.; Croce, D.; Basili, R.; Nardi, D., Structured learning for spoken language understanding in human-robot interaction, Int. J. Robot. Res., 36, 5-7, 660-683 (2017)
[31] Thomas, B. J.; Jenkins, O. C., Roboframenet: verb-centric semantics for actions in robot middleware, (2012 IEEE International Conference on Robotics and Automation (2012)), 4750-4755
[32] Fillmore, C. J., Frames and the semantics of understanding, Quad. Semant., 6, 2, 222-254 (1985)
[33] Baker, C. F.; Fillmore, C. J.; Lowe, J. B., The berkeley framenet project, (Proceedings of ACL and COLING (1998)), 86-90
[34] Capobianco, R.; Serafin, J.; Dichtl, J.; Grisetti, G.; Iocchi, L.; Nardi, D., A proposal for semantic map representation and evaluation, (2015 European Conference on Mobile Robots (ECMR) (2015), IEEE), 1-6
[35] Pangercic, D.; Tenorth, M.; Pitzer, B.; Beetz, M., Semantic object maps for robotic housework - representation, acquisition and use, (2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vilamoura, Portugal (2012))
[36] Galindo, C.; Saffiotti, A.; Coradeschi, S.; Buschka, P.; Fernandez-Madrigal, J.-A.; González, J., Multi-hierarchical semantic maps for mobile robotics, (2005 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005) (2005), IEEE), 2278-2283
[37] Buschka, P.; Saffiotti, A., A virtual sensor for room detection, (IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1 (2002)), 637-642
[38] Wu, J.; Christensen, H. I.; Rehg, J. M., Visual place categorization: problem, dataset, and algorithm, (2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (2009)), 4763-4770
[39] Mozos, O. M.; Mizutani, H.; Kurazume, R.; Hasegawa, T., Categorization of indoor places using the kinect sensor, Sensors, 12, 5, 6695-6711 (2012)
[40] Gemignani, G.; Capobianco, R.; Bastianelli, E.; Bloisi, D. D.; Iocchi, L.; Nardi, D., Living with robots: interactive environmental knowledge acquisition, Robot. Auton. Syst., 78, Supplement C, 1-16 (2016)
[41] Skočaj, D.; Kristan, M.; Vrečko, A.; Mahnič, M.; Janíček, M.; Kruijff, G.-J. M.; Hanheide, M.; Hawes, N.; Keller, T.; Zillich, M., A system for interactive learning in dialogue with a tutor, (2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2011), IEEE), 3387-3394
[42] Miller, G. A., Wordnet: a lexical database for English, Commun. ACM, 38, 11, 39-41 (1995)
[43] Speer, R.; Chin, J.; Havasi, C., Conceptnet 5.5: an open multilingual graph of general knowledge, (Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 4-9 February, 2017 (2017)), 4444-4451
[44] Shah, A.; Basile, V.; Cabrio, E.; Kamath S., S., Frame instance extraction and clustering for default knowledge building, (Proceedings of the 1st International Workshop on Application of Semantic Web Technologies in Robotics Co-Located with 14th Extended Semantic Web Conference (ESWC 2017). Proceedings of the 1st International Workshop on Application of Semantic Web Technologies in Robotics Co-Located with 14th Extended Semantic Web Conference (ESWC 2017), Portoroz, Slovenia, May 29th, 2017 (2017)), 1-10
[45] Chaplot, D. S.; Sathyendra, K. M.; Pasumarthi, R. K.; Rajagopal, D.; Salakhutdinov, R., Gated-attention architectures for task-oriented language grounding, (Thirty-Second AAAI Conference on Artificial Intelligence (2018))
[46] Altun, Y.; Tsochantaridis, I.; Hofmann, T., Hidden Markov support vector machines, (Proc. of ICML (2003))
[47] Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R.; Lin, C.-J., Liblinear: a library for large linear classification, J. Mach. Learn. Res., 9, 1871-1874 (2008) · Zbl 1225.68175
[48] Filice, S.; Castellucci, G.; Croce, D.; Basili, R., Kelp: a kernel-based learning platform for natural language processing, (Proceedings of ACL2015: System Demonstrations. Proceedings of ACL2015: System Demonstrations, Beijing, China (2015))
[49] Ferraresi, A.; Zanchetta, E.; Baroni, M.; Bernardini, S., Introducing and evaluating ukwac, a very large web-derived corpus of English, (Proceedings of the 4th Web as Corpus Workshop (WAC-4) Can We Beat Google (2008)), 47-54
[50] Bastianelli, E.; Castellucci, G.; Croce, D.; Basili, R.; Nardi, D., Huric: a human robot interaction corpus, (Proceedings of LREC 2014. Proceedings of LREC 2014, Reykjavik, Iceland (2014))
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.