×

zbMATH — the first resource for mathematics

MultiWiBi: the multilingual Wikipedia bitaxonomy project. (English) Zbl 1392.68421
Summary: We present MultiWiBi, an approach to the automatic creation of two integrated taxonomies for Wikipedia pages and categories written in different languages. In order to create both taxonomies in an arbitrary language, we first build them in English and then project the two taxonomies to other languages automatically, without the help of language-specific resources or tools. The process crucially leverages a novel algorithm which exploits the information available in either one of the taxonomies to reinforce the creation of the other taxonomy. Our experiments show that the taxonomical information in MultiWiBi is characterized by a higher quality and coverage than state-of-the-art resources like DBpedia, YAGO, MENTA, WikiNet, LHD and WikiTaxonomy, also across languages. MultiWiBi is available online at http://wibitaxonomy.org/multiwibi.

MSC:
68T50 Natural language processing
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Mitchell, T., Reading the web: a breakthrough goal for AI, AI Mag., 1517-1519, (2005)
[2] Mirkin, S.; Dagan, I.; Shnarch, E., Evaluating the inferential utility of lexical-semantic resources, (Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’09, Athens, Greece, (2009)), 558-566
[3] Poon, H.; Christensen, J.; Domingos, P.; Etzioni, O.; Hoffmann, R.; Kiddon, C.; Lin, T.; Ling, X.; Mausam; Ritter, A.; Schoenmackers, S.; Soderland, S.; Weld, D.; Wu, F.; Zhang, C., Machine Reading at the university of Washington, (Proceedings of the 1st International Workshop on Formalisms and Methodology for Learning by Reading in Conjunction with NAACL-HLT 2010, Los Angeles, California, USA, (2010)), 87-95
[4] Singhal, A., Introducing the knowledge graph: things, not strings, (2012), Retrieved May 18, 2012
[5] Ferrucci, D. A., Introduction to “this is watson”, IBM J. Res. Dev., 56, 3, 1, (2012)
[6] (Fellbaum, C., WordNet: An Electronic Database, (1998), MIT Press Cambridge, MA) · Zbl 0913.68054
[7] Mcnamee, P.; Snow, R.; Schone, P., Learning named entity hyponyms for question answering, (Proceedings of the Third International Joint Conference on Natural Language Processing, (2008)), 799-804
[8] Moldovan, D.; Novischi, A., Lexical chains for question answering, (Proceedings of the 19th International Conference on Computational Linguistics, COLING ’02, Taipei, Taiwan, 24 August-1 September 2002, (2002)), 1-7
[9] Cui, H.; Kan, M.-Y.; Chua, T.-S., Soft pattern matching models for definitional question answering, ACM Trans. Inf. Syst., 25, 2, 1-30, (2007)
[10] Ferrucci, D.; Brown, E.; Chu-Carroll, J.; Fan, J.; Gondek, D.; Kalyanpur, A. A.; Lally, A.; Murdock, J. W.; Nyberg, E.; Prager, J., Building Watson: an overview of the deepqa project, AI Mag., 31, 3, 59-79, (2010)
[11] Etzioni, O.; Banko, M.; Cafarella, M. J., Machine Reading, (Proc. of AAAI, (2006)), 1517-1519
[12] Lin, T.; Etzioni, O., Entity linking at web scale, (Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, AKBC-WEKEX ’12, (2012), Association for Computational Linguistics), 84-88
[13] Moro, A.; Raganato, A.; Navigli, R., Entity linking meets word sense disambiguation: a unified approach, Trans. Assoc. Comput. Linguist., 2, 231-244, (2014)
[14] Delli Bovi, C.; Telesca, L.; Navigli, R., Large-scale information extraction from textual definitions through deep syntactic and semantic analysis, Trans. Assoc. Comput. Linguist., 3, 529-543, (2015)
[15] Moro, A.; Li, H.; Krause, S.; Xu, F.; Navigli, R.; Uszkoreit, H., Semantic rule filtering for web-scale relation extraction, (The Semantic Web - ISWC 2013 - Proceedings of the 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Part I, (2013)), 347-362
[16] Pennacchiotti, M.; Pantel, P., Ontologizing semantic relations, (Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING ’06, Sydney, Australia, 17-21 July 2006, (2006)), 793-800
[17] Soderland, S.; Mandhani, B., Moving from textual relations to ontologized relations, (AAAI Spring Symposium: Machine Reading, (2007), AAAI), 85-90
[18] Sutcliffe, G.; Suda, M.; Teyssandier, A.; Dellis, N.; de Melo, G., Progress towards effective automated reasoning with world knowledge, (FLAIRS Conference, (2010)), 110-115
[19] Pasca, M.; Harabagiu, S., The informative role of wordnet in open-domain question answering, (Proceedings of NAACL-01 Workshop on WordNet and Other Lexical Resources, (2001)), 138-143
[20] Navigli, R., Word sense disambiguation: a survey, ACM Comput. Surv., 41, 2, 1-69, (2009)
[21] Navigli, R., A quick tour of word sense disambiguation, induction and related approaches, (Bieliková, M.; Friedrich, G.; Gottlob, G.; Katzenbeisser, S.; Turán, G., SOFSEM 2012: Theory and Practice of Computer Science, Lecture Notes in Computer Science, vol. 7147, (2012), Springer Heidelberg), 115-129
[22] Medelyan, O.; Milne, D.; Legg, C.; Witten, I. H., Mining meaning from wikipedia, Int. J. Hum.-Comput. Stud., 67, 9, 716-754, (2009)
[23] Hovy, E. H.; Navigli, R.; Ponzetto, S. P., Collaboratively built semi-structured content and artificial intelligence: the story so far, Artif. Intell., 194, 2-27, (2013) · Zbl 1270.68362
[24] Banko, M.; Cafarella, M. J.; Soderland, S.; Broadhead, M.; Etzioni, O., Open information extraction from the web, (Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI ’07, Hyderabad, India, 6-12 January 2007, (2007)), 2670-2676
[25] Fader, A.; Soderland, S.; Etzioni, O., Identifying relations for open information extraction, (Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, Edinburgh, UK, (2011)), 1535-1545
[26] Moro, A.; Navigli, R., Integrating syntactic and semantic analysis into the open information extraction paradigm, (Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI ’13, Beijing, China, (2013)), 2148-2154
[27] Gómez-Pérez, A.; Manzano-Macho, D., A survey of ontology learning methods and techniques, (2003), OntoWeb Deliverable D 1 (5)
[28] Hearst, M. A., Automatic acquisition of hyponyms from large text corpora, (Proceedings of the 25th International Conference on Computational Linguistics, COLING ’92, Nantes, France, (1992)), 539-545
[29] Ponzetto, S. P.; Strube, M., Deriving a large scale taxonomy from wikipedia, (Proceedings of the 22nd Conference on the Advancement of Artificial Intelligence, AAAI ’07, Vancouver, B.C., Canada, 22-26 July 2007, (2007)), 1440-1445 · Zbl 1182.68291
[30] Hoffart, J.; Suchanek, F. M.; Berberich, K.; Weikum, G., YAGO2: a spatially and temporally enhanced knowledge base from wikipedia, Artif. Intell., 194, 28-61, (2013) · Zbl 1270.68303
[31] Nastase, V.; Strube, M., Transforming wikipedia into a large scale multilingual concept network, Artif. Intell., 194, 62-85, (2013) · Zbl 1270.68305
[32] de Melo, G.; Weikum, G., MENTA: inducing multilingual taxonomies from wikipedia, (Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, New York, NY, USA, (2010)), 1099-1108
[33] Kliegr, T.; Zeman, V.; Dojchinovski, M., Linked hypernyms dataset-generation framework and use cases, (3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing, (2014)), 82
[34] Flati, T.; Vannella, D.; Pasini, T.; Navigli, R., Two is bigger (and better) than one: the wikipedia bitaxonomy project, (Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, (2014), Association for Computational Linguistics Baltimore, Maryland), 945-955
[35] Navigli, R.; Velardi, P., Learning word-class lattices for definition and hypernym extraction, (Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, (2010), Association for Computational Linguistics Uppsala, Sweden), 1318-1327
[36] Navigli, R.; Ponzetto, S. P., Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network, Artif. Intell., 193, 217-250, (2012) · Zbl 1270.68299
[37] Calzolari, N.; Pecchia, L.; Zampolli, A., Working on the Italian machine dictionary: a semantic approach, (Proceedings of the 5th Conference on Computational Linguistics, COLING ’73, Pisa, Italy, (1973)), 49-70
[38] Amsler, R. A., A taxonomy for English nouns and verbs, (Proceedings of the 19th Annual Meeting of the Association for Computational Linguistics, ACL ’81, (1981), Stanford California, USA), 133-138
[39] Calzolari, N., Towards the organization of lexical definitions on a database structure, (Proceedings of the 9th Conference on Computational Linguistics, COLING ’82, Prague, Czechoslovakia, (1982)), 61-64
[40] Ide, N.; Véronis, J., Extracting knowledge bases from machine-readable dictionaries: have we wasted our time?, (Proceedings of the Workshop on Knowledge Bases and Knowledge Structures, KB&KS ’93, Tokyo, Japan, (1993)), 257-266
[41] Kozareva, Z.; Hovy, E. H., A semi-supervised method to learn and construct taxonomies using the web, (Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’10, Seattle, WA, USA, (2010)), 1110-1118
[42] Navigli, R.; Velardi, P.; Faralli, S., A graph-based algorithm for inducing lexical taxonomies from scratch, (Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain, (2011)), 1872-1877
[43] Velardi, P.; Faralli, S.; Navigli, R., Ontolearn reloaded: a graph-based algorithm for taxonomy induction, Comput. Linguist., 39, 3, 665-707, (2013)
[44] Klein, D.; Manning, C. D., Fast exact inference with a factored model for natural language parsing, (Advances in Neural Information Processing Systems, vol. 15, NIPS, Vancouver, British Columbia, Canada, (2003)), 3-10
[45] Saggion, H., Identifying definitions in text collections for question answering, (Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC ’04, Lisbon, Portugal, 26-28 May 2004, (2004), European Language Resources Association), 1927-1930
[46] Ruiz-Casado, M.; Alfonseca, E.; Castells, P., Automatic assignment of wikipedia encyclopedic entries to wordnet synsets, (Advances in Web Intelligence, Lecture Notes in Computer Science, vol. 3528, (2005), Springer Verlag), 380-386
[47] Etzioni, O.; Cafarella, M.; Downey, D.; Kok, S.; Popescu, A.; Shaked, T.; Soderland, S.; Weld, D. S.; Yates, A., Web-scale information extraction in knowitall, (Proceedings of the 13th International Conference on World Wide Web, WWW ’04, (2004)), 100-110
[48] Etzioni, O.; Cafarella, M.; Downey, D.; Popescu, A.-M.; Shaked, T.; Soderland, S.; Weld, D. S.; Yates, A., Unsupervised named-entity extraction from the web: an experimental study, Artif. Intell., 165, 1, 91-134, (2005)
[49] Blohm, S., Using the web to reduce data sparseness in pattern-based information extraction, (Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD, (2007), Springer Warsaw, Poland), 18-29
[50] Pantel, P.; Ravichandran, D., Automatically labeling semantic classes, (Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT ’13, Boston, Massachusetts, 2-7 May 2004, (2004)), 321-328
[51] Snow, R.; Jurafsky, D.; Ng, A., Semantic taxonomy induction from heterogeneous evidence, (Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING-ACL ’06, (2006)), 801-808
[52] Ponzetto, S. P.; Strube, M., Taxonomy induction based on a collaboratively built knowledge repository, Artif. Intell., 175, 9-10, 1737-1756, (2011)
[53] Suchanek, F. M.; Kasneci, G.; Weikum, G., YAGO: a large ontology from wikipedia and wordnet, J. Web Semant., 6, 3, 203-217, (2008)
[54] Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ive, Z., Dbpedia: a nucleus for a web of open data, (Proceedings of 6th International Semantic Web Conference Joint with 2nd Asian Semantic Web Conference, ISWC+ASWC 2007, Busan, Korea, (2007)), 722-735
[55] Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J., Freebase: a collaboratively created graph database for structuring human knowledge, (Proceedings of the International Conference on Management of Data, SIGMOD ’08, New York, NY, USA, (2008)), 1247-1250
[56] Nastase, V.; Strube, M.; Boerschinger, B.; Zirn, C.; Elghafari, A., Wikinet: a very large scale multi-lingual concept network, (Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC’10, Valletta, Malta, (2010)), 1015-1022
[57] Sumida, A.; Yoshinaga, N.; Torisawa, K., Boosting precision and recall of hyponymy relation acquisition from hierarchical layouts in wikipedia, (LREC, (2008), European Language Resources Association), 2462-2469
[58] Pilehvar, M. T.; Navigli, R., A robust approach to aligning heterogeneous lexical resources, (Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (2014), Association for Computational Linguistics Baltimore, Maryland), 468-478
[59] Niemann, E.; Gurevych, I., The People’s web meets linguistic knowledge: automatic sense alignment of wikipedia and wordnet, (Proceedings of the Ninth International Conference on Computational Semantics, IWCS ’11, (2011), Association for Computational Linguistics Stroudsburg, PA, USA), 205-214
[60] Gurevych, I.; Eckle-Kohler, J.; Hartmann, S.; Matuschek, M.; Meyer, C. M.; Wirth, C., UBY: a large-scale unified lexical-semantic resource based on LMF, (Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’12, Stroudsburg, PA, USA, (2012)), 580-590
[61] Christian, M. M.; Iryna, G., To exhibit is not to loiter: a multilingual, sense-disambiguated wiktionary for measuring verb similarity, (Proceedings of the 24th International Conference on Computational Linguistics, COLING 2012, vol. 4, (2012)), 1763-1780
[62] Camacho-Collados, J.; Pilehvar, M. T.; Navigli, R., Nasari: a novel approach to a semantically-aware representation of items, (Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2015), Association for Computational Linguistics Denver, Colorado), 567-577
[63] Camacho-Collados, J.; Pilehvar, M. T.; Navigli, R., Nasari: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities, Artif. Intell., 240, 36-64, (2016) · Zbl 1386.68184
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.