zbMATH — the first resource for mathematics

Toward any-language zero-shot topic classification of textual documents. (English) Zbl 07099216
Summary: In this paper, we present a zero-shot classification approach to document classification in any language into topics which can be described by English keywords. This is done by embedding both labels and documents into a shared semantic space that allows one to compute meaningful semantic similarity between a document and a potential label. The embedding space can be created by either mapping into a Wikipedia-based semantic representation or learning cross-lingual embeddings. But if the Wikipedia in the target language is small or there is not enough training corpus to train a good embedding space for low-resource languages, then performance can suffer. Thus, for low-resource languages, we further use a word-level dictionary to convert documents into a high-resource language, and then perform classification based on the high-resource language. This approach can be applied to thousands of languages, which can be contrasted with machine translation, which is a supervision-heavy approach feasible for about 100 languages. We also develop a ranking algorithm that makes use of language similarity metrics to automatically select a good pivot or bridging high-resource language, and show that this significantly improves classification of low-resource language documents, performing comparably to the best bridge possible.
68T Artificial intelligence
Full Text: DOI
[1] Dagan, I.; Karov, Y.; Roth, D., Mistake-driven learning in text categorization, (EMNLP, (1997)), 55-63
[2] Joachims, T., Text categorization with support vector machines: learning with many relevant features, (ECML, (1998)), 137-142
[3] Dumais, S.; Chen, H., Hierarchical classification of web content, (SIGIR, (2000)), 256-263
[4] Agrawal, R.; Gupta, A.; Prabhu, Y.; Varma, M., Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages, (WWW, (2013)), 13-24
[5] Amini, M.; Goutte, C., A co-classification approach to learning from multilingual corpora, Mach. Learn., 79, 1-2, 105-121, (2010)
[6] Klementiev, A.; Titov, I.; Bhattarai, B., Inducing crosslingual distributed representations of words, (COLING, (2012)), 1459-1474
[7] Palatucci, M.; Pomerleau, D.; Hinton, G. E.; Mitchell, T. M., Zero-shot learning with semantic output codes, (NIPS, (2009)), 1410-1418
[8] Socher, R.; Ganjoo, M.; Manning, C. D.; Ng, A. Y., Zero-shot learning through cross-modal transfer, (NIPS, (2013)), 935-943
[9] Elhoseiny, M.; Saleh, B.; Elgammal, A., Write a classifier: zero shot learning using purely textual descriptions, (ICCV, (2013)), 1433-1441
[10] Romera-Paredes, B.; Torr, P. H.S., An embarrassingly simple approach to zero-shot learning, (ICML, (2015)), 2152-2161
[11] Li, F.; Fergus, R.; Perona, P., One-shot learning of object categories, IEEE Trans. Pattern Anal. Mach. Intell., 28, 4, 594-611, (2006)
[12] Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B., Human-level concept learning through probabilistic program induction, Science, 350, 6266, 1332-1338, (2015) · Zbl 1355.68230
[13] Potthast, M.; Stein, B.; Anderka, M., A Wikipedia-based multilingual Retrieval model, (ECIR, (2008)), 522-530
[14] Sorg, P.; Cimiano, P., Exploiting Wikipedia for cross-lingual and multilingual information retrieval, Data Knowl. Eng., 74, 26-45, (2012)
[15] Smith, S. L.; Turban, D. H.P.; Hamblin, S.; Hammerla, N. Y., Offline Bilingual Word Vectors, Orthogonal Transformations and the Inverted Softmax, (2017)
[16] Conneau, A.; Lample, G.; Ranzato, M.; Denoyer, L.; Jégou, H., Word Translation Without Parallel Data, (2018)
[17] Mausam; Soderland, S.; Etzioni, O.; Weld, D. S.; Reiter, K.; Skinner, M.; Sammer, M.; Bilmes, J., Panlingual lexical translation via probabilistic inference, Artif. Intell., 174, 9-10, 619-637, (2010)
[18] Paul, M.; Finch, A. M.; Sumita, E., How to choose the best pivot language for automatic translation of low-resource languages, ACM Trans. Asian Lang. Inf. Process., 12, 4, 14, (2013)
[19] Herbrich, R.; Graepel, T.; Obermayer, K., Large Margin Rank Boundaries for Ordinal Regression, (2000), MIT Press: MIT Press Cambridge, MA
[20] Chapelle, O.; Keerthi, S. S., Efficient algorithms for ranking with SVMs, Inf. Retr., 13, 3, 201-215, (2010)
[21] Song, Y.; Upadhyay, S.; Peng, H.; Roth, D., Cross-lingual dataless classification for many languages, (IJCAI, (2016)), 2901-2907
[22] Song, Y.; Mayhew, S.; Roth, D., Cross-lingual dataless classification for languages with small Wikipedia presence, preprint
[23] Chang, M.-W.; Ratinov, L.; Roth, D.; Srikumar, V., Importance of semantic representation: dataless classification, (AAAI, (2008)), 830-835
[24] Song, Y.; Roth, D., On dataless hierarchical text classification, (AAAI, (2014)), 1579-1585
[25] Brown, P. F.; Pietra, V. J.D.; de Souza, P. V.; Lai, J. C.; Mercer, R. L., Class-based n-gram models of natural language, Comput. Linguist., 18, 4, 467-479, (1992)
[26] Liang, P., Semi-Supervised Learning for Natural Language, (2005), Massachusetts Institute of Technology, Master’s thesis
[27] Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. P., Natural language processing (almost) from scratch, J. Mach. Learn. Res., 12, 2493-2537, (2011) · Zbl 1280.68161
[28] Turian, J.; Ratinov, L.-A.; Bengio, Y., Word representations: a simple and general method for semi-supervised learning, (ACL, (2010)), 384-394
[29] Mikolov, T.; Yih, W.-T.; Zweig, G., Linguistic regularities in continuous space word representations, (HLT-NAACL, (2013)), 746-751
[30] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; Dean, J., Distributed representations of words and phrases and their compositionality, (NIPS, (2013)), 3111-3119
[31] Blei, D. M.; Ng, A. Y.; Jordan, M. I., Latent Dirichlet allocation, J. Mach. Learn. Res., 3, 993-1022, (2003) · Zbl 1112.68379
[32] Gabrilovich, E.; Markovitch, S., Wikipedia-based semantic interpretation for natural language processing, J. Artif. Intell. Res., 34, 1, 443-498, (2009) · Zbl 1182.68319
[33] Song, Y.; Roth, D., Unsupervised sparse vector densification for short text similarity, (NAACL-HLT, (2015)), 1275-1280
[34] Shirakawa, M.; Hara, T.; Nishio, S., MLJ: language-independent real-time search of tweets reported by media outlets and journalists, Proc. VLDB Endow., 7, 13, 1605-1608, (2014)
[35] Al-Rfou’, R.; Perozzi, B.; Skiena, S., Polyglot: distributed word representations for multilingual NLP, (CoNLL, (2013)), 183-192
[36] Xiao, M.; Guo, Y., Semi-supervised representation learning for cross-lingual text classification, (EMNLP, (2013)), 1465-1475
[37] Hermann, K. M.; Blunsom, P., Multilingual models for compositional distributed semantics, (ACL, (2014)), 58-68
[38] Faruqui, M.; Dyer, C., Improving vector space word representations using multilingual correlation, (EACL, (2014)), 462-471
[39] Lu, A.; Wang, W.; Bansal, M.; Gimpel, K.; Livescu, K., Deep multilingual correlation for improved word embeddings, (NAACL-HLT, (2015)), 250-256
[40] Upadhyay, S.; Faruqui, M.; Dyer, C.; Roth, D., Cross-lingual models of word embeddings: an empirical comparison, (ACL, (2016))
[41] Zhang, D.; Mei, Q.; Zhai, C., Cross-lingual latent topic extraction, (ACL, (2010)), 1128-1137
[42] Lang, K., Newsweeder: learning to filter netnews, (ICML, (1995)), 331-339
[43] Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T., Enriching word vectors with subword information, TACL, 5, 135-146, (2017)
[44] Raganato, A.; Bovi, C. D.; Navigli, R., Automatic construction and evaluation of a large semantically enriched Wikipedia, (IJCAI, (2016)), 2894-2900
[45] Navigli, R.; Ponzetto, S. P., BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network, Artif. Intell., 193, 217-250, (2012) · Zbl 1270.68299
[46] (Fellbaum, C., WordNet: An Electronic Lexical Database, (1998), MIT Press) · Zbl 0913.68054
[47] Camacho-Collados, J.; Pilehvar, M. T.; Navigli, R., Nasari: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities, Artif. Intell., 240, 36-64, (2016) · Zbl 1386.68184
[48] Iacobacci, I.; Pilehvar, M. T.; Navigli, R., SensEmbed: learning sense embeddings for word and relational similarity, (ACL, (2015)), 95-105
[49] Bovi, C. D.; Raganato, A., Sew-Embed at SemEval-2017 task 2: language-independent concept representations from a semantically enriched Wikipedia, (SemEval@ACL, (2017), Association for Computational Linguistics), 261-266
[50] Iacobacci, I.; Pilehvar, M. T.; Navigli, R., Embeddings for word sense disambiguation: an evaluation study, (ACL, (2016))
[51] Pilehvar, M. T.; Camacho-Collados, J.; Navigli, R.; Collier, N., Towards a seamless integration of word senses into downstream NLP applications, (ACL, (2017)), 1857-1869
[52] Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F., RCV1: a new benchmark collection for text categorization research, J. Mach. Learn. Res., 5, 361-397, (2004)
[53] Yang, Y., An evaluation of statistical approaches to text categorization, Inf. Retr., 1, 1-2, 69-90, (1999)
[54] Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R.; Lin, C.-J., LIBLINEAR: a library for large linear classification, J. Mach. Learn. Res., 9, 1871-1874, (2008) · Zbl 1225.68175
[55] Koehn, P., Europarl: a parallel corpus for statistical machine translation, (Machine Translation Summit, (2005)), 79-86
[56] Howell, D. C., Statistical Methods for Psychology, Cengage Learning, (2011)
[57] Pan, S. J.; Yang, Q., A survey on transfer learning, IEEE Trans. Knowl. Data Eng., 22, 10, 1345-1359, (2010)
[58] Prettenhofer, P.; Stein, B., Cross-language text classification using structural correspondence learning, (ACL, (2010)), 1118-1127
[59] Shi, L.; Mihalcea, R.; Tian, M., Cross language text classification by model translation and semi-supervised learning, (EMNLP, (2010)), 1057-1067
[60] Mann, G. S.; Yarowsky, D., Multipath translation lexicon induction via bridge languages, (NAACL, (2001))
[61] Cohn, T.; Lapata, M., Machine translation by triangulation: making effective use of multi-parallel corpora, (ACL, (2007))
[62] Utiyama, M.; Isahara, H., A comparison of pivot methods for phrase-based statistical machine translation, (NAACL-HLT, (2007)), 484-491
[63] Wu, H.; Wang, H., Revisiting pivot language approach for machine translation, (ACL/IJCNLP, (2009)), 154-162
[64] Leusch, G.; Max, A.; Crego, J. M.; Ney, H., Multi-pivot translation by system combination, (2010 International Workshop on Spoken Language Translation. 2010 International Workshop on Spoken Language Translation, IWSLT 2010, Paris, France, December 2-3, 2010, (December 2010)), 299-306
[65] Yazdani, M.; Henderson, J., A model of zero-shot learning of spoken language understanding, (EMNLP, (2015)), 244-249
[66] Lazaridou, A.; Dinu, G.; Baroni, M., Hubness and pollution: delving into cross-space mapping for zero-shot learning, (ACL, (2015)), 270-280
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.