×

Exploiting meta features for dependency parsing and part-of-speech tagging. (English) Zbl 1344.68251

Summary: In recent years, discriminative methods have achieved much progress in natural language processing tasks, such as parsing, part-of-speech tagging, and word segmentation. For these methods, conventional features in a relatively high dimensional feature space may suffer from sparseness and thus exhibit less discriminative power on unseen data. This article presents a learning framework of feature transformation, addressing the sparseness problem by transforming sparse conventional base features into less sparse high-level features (i.e. meta features) with the help of a large amount of automatically annotated data. The meta features are derived by bucketing similar base features according to the frequency in large data, and used together with base features in our final system. We apply the framework to part-of-speech tagging and dependency parsing. Experimental results show that our systems perform better than the baseline systems in both tasks on standard evaluation. For the dependency parsing task, our parsers achieve state-of-the-art accuracy on the Chinese data and comparable accuracy with the best known systems on the English data. Further analysis indicates that our proposed approach is effective in processing unseen data and features.

MSC:

68T50 Natural language processing
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Toutanova, K.; Klein, D.; Manning, C. D.; Singer, Y., Feature-rich part-of-speech tagging with a cyclic dependency network, (Proceedings of NAACL2003 (2003))
[2] Shen, L.; Satta, G.; Joshi, A., Guided learning for bidirectional sequence classification, (Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic (2007), Association for Computational Linguistics), 760-767
[3] McDonald, R.; Pereira, F., Online learning of approximate dependency parsing algorithms, (Proceedings of EACL 2006 (2006)), 81-88
[4] Zhang, Y.; Nivre, J., Transition-based dependency parsing with rich non-local features, (Proceedings of ACL-HLT2011. Proceedings of ACL-HLT2011, Portland, Oregon, USA (2011), Association for Computational Linguistics), 188-193
[5] Carreras, X., Experiments with a higher-order projective dependency parser, (Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007. Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prague, Czech Republic (2007), Association for Computational Linguistics), 957-961
[6] Koo, T.; Collins, M., Efficient third-order dependency parsers, (Proceedings of ACL 2010. Proceedings of ACL 2010, Uppsala, Sweden (2010), Association for Computational Linguistics), 1-11
[7] Bohnet, B., Top accuracy and fast dependency parsing is not a contradiction, (Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China (2010), Coling 2010 Organizing Committee), 89-97
[8] McDonald, R.; Crammer, K.; Pereira, F., Online large-margin training of dependency parsers, (Proceedings of ACL 2005. Proceedings of ACL 2005, Association for Computational Linguistics (2005)), 91-98
[9] Yamada, H.; Matsumoto, Y., Statistical dependency analysis with support vector machines, (Proceedings of IWPT 2003 (2003)), 195-206
[10] Nivre, J.; Scholz, M., Deterministic dependency parsing of English text, (Proc. of the 20th Intern. Conf. on Computational Linguistics (COLING) (2004)), 64-70
[11] Brown, P. F.; deSouza, P. V.; Mercer, R. L.; Watson, T. J.; Pietra, V. J.D.; Lai, J. C., Class-based n-gram models of natural language, Comput. Linguist., 19, 2, 263-311 (1993)
[12] Sagae, K.; Gordon, A. S., Clustering words by syntactic similarity improves dependency parsing of predicate-argument structures, (Proceedings of the 11th International Conference on Parsing Technologies. Proceedings of the 11th International Conference on Parsing Technologies, IWPT’09 (2009), Association for Computational Linguistics: Association for Computational Linguistics Paris, France), 192-201
[13] Miller, S.; Guinness, J.; Zamanian, A., Name tagging with word clusters and discriminative training, (Susan Dumais, D. M.; Roukos, S., HLT-NAACL 2004: Main Proceedings. HLT-NAACL 2004: Main Proceedings, Boston, Massachusetts, USA (2004), Association for Computational Linguistics), 337-342
[14] Koo, T.; Carreras, X.; Collins, M., Simple semi-supervised dependency parsing, (Proceedings of ACL-08: HLT. Proceedings of ACL-08: HLT, Columbus, Ohio (2008))
[15] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; Dean, J., Distributed representations of words and phrases and their compositionality, (Advances in Neural Information Processing Systems, vol. 26 (2013)), 3111-3119
[16] Xiao, M.; Guo, Y.; Yates, A., Semi-supervised representation learning for domain adaptation using dynamic dependency networks, (Proceedings of COLING 2012. Proceedings of COLING 2012, Mumbai, India (2012), The COLING 2012 Organizing Committee), 2867-2882
[17] Zheng, X.; Chen, H.; Xu, T., Deep learning for Chinese word segmentation and POS tagging, (Proceedings of EMNLP 2013 (2013), Association for Computational Linguistics), 647-657
[18] Turian, J.; Ratinov, L.; Bengio, Y., Word representations: a simple and general method for semi-supervised learning, (Proceedings of ACL 2010 (2010), Association for Computational Linguistics), 384-394
[19] Yu, M.; Zhao, T.; Dong, D.; Tian, H.; Yu, D., Compound embedding features for semi-supervised learning, (Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia (2013), Association for Computational Linguistics), 563-568
[20] Bansal, M.; Gimpel, K.; Livescu, K., Tailoring continuous word representations for dependency parsing, (Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers). Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), Baltimore, Maryland (2014), Association for Computational Linguistics), 809-815
[21] Chen, W.; Kazama, J.; Uchimoto, K.; Torisawa, K., Improving dependency parsing with subtrees from auto-parsed data, (Proceedings of EMNLP 2009. Proceedings of EMNLP 2009, Singapore (2009)), 570-579
[22] Wang, Y.; Kazama, J.; Tsuruoka, Y.; Chen, W.; Zhang, Y.; Torisawa, K., Improving Chinese word segmentation and POS tagging with semi-supervised methods using large auto-analyzed data, (Proceedings of 5th International Joint Conference on Natural Language Processing. Proceedings of 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand (2011), Asian Federation of Natural Language Processing), 309-317
[23] Chen, W.; Zhang, M.; Zhang, Y., Semi-supervised feature transformation for dependency parsing, (Proceedings of EMNLP 2013. Proceedings of EMNLP 2013, Seattle, Washington, USA (2013), Association for Computational Linguistics), 1303-1313
[24] Chen, W.; Kazama, J.; Uchimoto, K.; Torisawa, K., Exploiting subtrees in auto-parsed data to improve dependency parsing, Comput. Intell., 28, 3, 426-451 (2012)
[25] Jelinek, Frederick; Mercer, Robert L., Interpolated estimation of Markov source parameters from sparse data, (Proceedings of the Workshop on Pattern Recognition in Practice (1980), North-Holland: North-Holland Amsterdam, The Netherlands)
[26] Jelinek, F., Self-organized language modeling for speech recognition, (Readings in Speech Recognition (1990)), 450-506
[27] Brown, P. F.; Pietra, V. J.D.; Mercer, R. L.; Pietra, S. A.D.; Lai, J. C., An estimate of an upper bound for the entropy of English, Comput. Linguist., 18, 1, 31-40 (1992)
[28] Bahl, L. R.; Jelinek, F.; Mercer, R., A maximum likelihood approach to continuous speech recognition, IEEE Trans. Pattern Anal. Mach. Intell., 2, 179-190 (1983)
[29] Nivre, J.; McDonald, R., Integrating graph-based and transition-based dependency parsers, (Proceedings of ACL-08: HLT. Proceedings of ACL-08: HLT, Columbus, Ohio (2008))
[30] Nivre, J., An efficient algorithm for projective dependency parsing, (Proceedings of IWPT2003 (2003)), 149-160
[31] McDonald, R.; Nivre, J., Characterizing the errors of data-driven dependency parsing models, (Proceedings of EMNLP-CoNLL (2007)), 122-131
[32] Eisner, J., Three new probabilistic models for dependency parsing: an exploration, (Proceedings of COLING1996 (1996)), 340-345
[33] Crammer, K.; Singer, Y., Ultraconservative online algorithms for multiclass problems, J. Mach. Learn. Res., 3, 951-991 (2003) · Zbl 1112.68497
[34] Marcus, M. P.; Santorini, B.; Marcinkiewicz, M. A., Building a large annotated corpus of English: the Penn Treebank, Comput. Linguist., 19, 2, 313-330 (1993)
[35] Xue, N.; Xia, F.; dong Chiou, F.; Palmer, M., Building a large annotated Chinese corpus: the Penn Chinese Treebank, J. Nat. Lang. Eng., 11, 2, 207-238 (2005)
[36] Zhang, Y.; Clark, S., A tale of two parsers: investigating and combining graph-based and transition-based dependency parsing, (Proceedings of EMNLP 2008. Proceedings of EMNLP 2008, Honolulu, Hawaii (2008)), 562-571
[37] Ratnaparkhi, A., A maximum entropy model for part-of-speech tagging, (Proceedings of EMNLP (1996, 1996)), 133-142
[38] Duan, X.; Zhao, J.; Xu, B., Probabilistic models for action-based Chinese dependency parsing, (Proceedings of ECML/ECPPKDD. Proceedings of ECML/ECPPKDD, Warsaw, Poland (2007))
[39] Li, Z.; Zhang, M.; Che, W.; Liu, T.; Chen, W.; Li, H., Joint models for Chinese POS tagging and dependency parsing, (Proceedings of EMNLP 2011. Proceedings of EMNLP 2011, UK (2011))
[40] Hatori, J.; Matsuzaki, T.; Miyao, Y.; Tsujii, J., Incremental joint POS tagging and dependency parsing in Chinese, (Proceedings of 5th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing. Proceedings of 5th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, Chiang Mai, Thailand (2011)), 1216-1224
[43] Kruengkrai, C.; Uchimoto, K.; Kazama, J.; Wang, Y.; Torisawa, K.; Isahara, H., An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging, (Proceedings of ACL-IJCNLP2009. Proceedings of ACL-IJCNLP2009, Suntec, Singapore (2009), Association for Computational Linguistics), 513-521
[44] Suzuki, J.; Isozaki, H.; Nagata, M., Learning condensed feature representations from large unsupervised data sets for supervised learning, (Proceedings of ACL2011. Proceedings of ACL2011, Portland, Oregon, USA (2011), Association for Computational Linguistics), 636-641
[45] Li, Z.; Zhang, M.; Che, W.; Liu, T., A separately passive-aggressive training algorithm for joint POS tagging and dependency parsing, (Proceedings of the 24rd International Conference on Computational Linguistics (Coling 2012). Proceedings of the 24rd International Conference on Computational Linguistics (Coling 2012), Mumbai, India (2012), Coling 2012 Organizing Committee)
[46] Ma, X.; Zhao, H., Fourth-order dependency parsing, (Proceedings of COLING 2012: Posters. Proceedings of COLING 2012: Posters, Mumbai, India (2012), The COLING 2012 Organizing Committee), 785-796
[47] Suzuki, J.; Isozaki, H.; Carreras, X.; Collins, M., An empirical study of semi-supervised structured conditional models for dependency parsing, (Proceedings of EMNLP2009. Proceedings of EMNLP2009, Singapore (2009), Association for Computational Linguistics), 551-560
[48] Zhou, G.; Zhao, J.; Liu, K.; Cai, L., Exploiting web-derived selectional preference to improve statistical dependency parsing, (Proceedings of ACL-HLT2011. Proceedings of ACL-HLT2011, Portland, Oregon, USA (2011), Association for Computational Linguistics), 1556-1565
[49] Chen, W.; Zhang, M.; Li, H., Utilizing dependency language models for graph-based dependency parsing models, (Proceedings of ACL 2012. Proceedings of ACL 2012, Korea (2012))
[50] Collins, M., Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms, (Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (2002), Association for Computational Linguistics), 1-8
[51] Suzuki, J.; Isozaki, H., Semi-supervised sequential labeling and segmentation using Giga-word scale unlabeled data, (Proceedings of ACL-08: HLT. Proceedings of ACL-08: HLT, Columbus, Ohio (2008), Association for Computational Linguistics), 665-673
[52] Zhang, Y.; Clark, S., Joint word segmentation and POS tagging using a single perceptron, (Proceedings of ACL-08: HLT. Proceedings of ACL-08: HLT, Columbus, Ohio (2008), Association for Computational Linguistics), 888-896
[53] Li, Z.; Che, W.; Liu, T., Improving Chinese POS tagging with dependency parsing, (Proceedings of 5th International Joint Conference on Natural Language Processing. Proceedings of 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand (2011), Asian Federation of Natural Language Processing), 1447-1451
[54] Jiang, W.; Huang, L.; Liu, Q.; Lü, Y., A cascaded linear model for joint Chinese word segmentation and part-of-speech tagging, (Proceedings of ACL-08: HLT. Proceedings of ACL-08: HLT, Columbus, Ohio (2008), Association for Computational Linguistics), 897-904
[55] Spoustová, D.j.; Hajič, J.; Raab, J.; Spousta, M., Semi-supervised training for the averaged perceptron POS tagger, (Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009). Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), Athens, Greece (2009), Association for Computational Linguistics), 763-771
[56] Søgaard, A., Simple semi-supervised training of part-of-speech taggers, (Proceedings of the ACL 2010 Conference Short Papers. Proceedings of the ACL 2010 Conference Short Papers, Uppsala, Sweden (2010), Association for Computational Linguistics), 205-208
[57] Søgaard, A., Semi-supervised condensed nearest neighbor for part-of-speech tagging, (Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA (2011), Association for Computational Linguistics), 48-52
[58] Pitler, E.; Bergsma, S.; Lin, D.; Church, K., Using web-scale n-grams to improve base np parsing performance, (Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China (2010), Coling 2010 Organizing Committee), 886-894
[59] Bansal, M.; Klein, D., Web-scale features for full-scale parsing, (Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA (2011), Association for Computational Linguistics), 693-702
[60] Sun, W.; Xu, J., Enhancing Chinese word segmentation using unlabeled data, (Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK (2011), Association for Computational Linguistics), 970-979
[61] Jin, Z.; Tanaka-Ishii, K., Unsupervised segmentation of Chinese text by use of branching entropy, (Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions. Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia (2006), Association for Computational Linguistics), 428-435
[62] McClosky, D.; Charniak, E.; Johnson, M., Reranking and self-training for parser adaptation, (Proceedings of Coling-ACL (2006)), 337-344
[63] Sagae, K.; Tsujii, J., Dependency parsing and domain adaptation with LR models and parser ensembles, (Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007 (2007)), 1044-1050
[64] Zhou, Z.-H.; Li, M., Tri-training: exploiting unlabeled data using three classifiers, IEEE Trans. Knowl. Data Eng., 17, 11, 1529-1541 (2005)
[65] Chang, B.; Han, D., Enhancing domain portability of Chinese segmentation model using chi-square statistics and bootstrapping, (Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA (2010), Association for Computational Linguistics), 789-798
[66] Mirroshandel, S. A.; Nasr, A.; Le Roux, J., Semi-supervised dependency parsing using lexical affinities, (Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers). Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), Jeju Island, Korea (2012), Association for Computational Linguistics), 777-785
[67] Ando, R.; Zhang, T., A high-performance semi-supervised learning method for text chunking, (ACL (2005))
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.