×

zbMATH — the first resource for mathematics

Ensemble and deep learning for language-independent automatic selection of parallel data. (English) Zbl 07052225
Summary: Machine translation is used in many applications in everyday life. Due to the increase of translated documents that need to be organized as useful or not (for building a translation model), the automated categorization of texts (classification), is a popular research field of machine learning. This kind of information can be quite helpful for machine translation. Our parallel corpora (English-Greek and English-Italian) are based on educational data, which are quite difficult to translate. We apply two state of the art architectures, Random Forest (RF) and Deeplearnig4j (DL4J), to our data (which constitute three translation outputs). To our knowledge, this is the first time that deep learning architectures are applied to the automatic selection of parallel data. We also propose new string-based features that seem to be effective for the classifier, and we investigate whether an attribute selection method could be used for better classification accuracy. Experimental results indicate an increase of up to 4% (compared to our previous work) using RF and rather satisfactory results using DL4J.
MSC:
68 Computer science
92 Biology and other natural sciences
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Collobert, R.; Weston, J.; A unified architecture for natural language processing: Deep neural networks with multitask learning; Proceedings of the 25th International Conference on Machine learning: New York, NY, USA 2008; .
[2] Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P.; Natural language processing (almost) from scratch; JMLR: 2011; Volume 12 ,2493-2537. · Zbl 1280.68161
[3] Koehn, P.; Och, F.J.; Marcu, D.; Statistical phrase-based translation; Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Stroudsburg, USA 2003; .
[4] Bentivogli, L.; Bisazza, A.; Cettolo, M.; Federico, M.; Neural versus phrase-based machine translation quality: A case study; Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing: Stroudsburg, PA, USA 2016; .
[5] Bahdanau, D.; Cho, K.; Bengio, Y.; Neural machine translation by jointly learning to align and translate; Proceedings of the 3th International Conference on Learning Representations: San Diego, CA, USA 2015; .
[6] Peris, Á.; Cebrián, L.; Casacuberta, F.; Online Learning for Neural Machine Translation Post-editing; arXiv: 2017; .
[7] Breiman, L.; Random forests; Mach. Learn.: 2001; Volume 45 ,5-32. · Zbl 1007.68152
[8] Mnih, A.; Hinton, G.E.; A scalable hierarchical distributed language model; Proceedings of the Advances in Neural Information Processing Systems: San Diego, CA, USA 2009; .
[9] Arora, R.; Comparative analysis of classification algorithms on different datasets using WEKA; IJCA: 2012; Volume 54 ,21-25.
[10] Mouratidis, D.; Kermanidis, K.L.; Automatic Selection of Parallel Data for Machine Translation; Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations: Berlin, Germany 2018; .
[11] Kalchbrenner, N.; Blunsom, P.; Recurrent continuous translation models; Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing (EMNLP): Stroudsburg, PA, USA 2013; .
[12] Kyunghyun, C.; Bart, V.M.; Caglar, G.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y.; Learning phrase representations using RNN encoder-decoder for statistical machine translation; Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP): Stroudsburg, PA, USA 2014; .
[13] Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y.; On the properties of neural machine translation: Encoder-decoder approaches; Proceedings of the SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation: Stroudsburg, PA, USA 2014; .
[14] Hill, F.; Cho, K.; Jean, S.; Devin, C.; Bengio, Y.; Embedding word similarity with neural machine translation; arXiv: 2015; .
[15] Sutskever, I.; Vinyals, O.; Le, Q.V.; Sequence to sequence learning with neural networks; Proceedings of the Advances in Neural Information Processing Systems: Cambridge, MA, USA 2014; .
[16] Skansi, S.; ; Introduction to Deep Learning: From Logical Calculus to Artificial Intelligence: Cham, Switzerland 2018; ,135-145. · Zbl 1398.68003
[17] Smialowski, P.; Frishman, D.; Kramer, S.; Pitfalls of supervised feature selection; Bioinformatics: 2009; Volume 26 ,440-443.
[18] Bordes, A.; Chopra, S.; Weston, J.; Question answering with subgraph embeddings; Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP): Stroudsburg, PA, USA 2014; .
[19] Bhosale, D.; Ade, R.; Feature Selection based Classification using Naive Bayes, J48 and Support Vector Machine; IJCA: 2014; Volume 99 ,14-18.
[20] Qiang, G.; An effective algorithm for improving the performance of Naive Bayes for text classification; Proceedings of the Second International Conference on Computer Research and Development: Los Alamitos, CA, USA 2010; .
[21] Mohamed, W.N.H.W.; Salleh, M.N.M.; Omar, A.H.; A comparative study of reduced error pruning method in decision tree algorithms; Proceedings of the IEEE International Conference on Control System, Computing and Engineering (ICCSCE): Piscataway, NJ, USA 2012; .
[22] Phyu, T.Z.; Oo, N.N.; Performance Comparison of Feature Selection Methods; MATEC Web Conf.: 2016; Volume 42 ,1-4.
[23] Mulay, S.A.; Devale, P.R.; Garje, G.V.; Decision tree based support vector machine for intrusion detection; Proceedings of the International Conference on Networking and Information Technology (ICNIT): Piscataway, NJ, USA 2010; .
[24] Bosch, A.; Zisserman, A.; Munoz, X.; Image classification using random forests and ferns; Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV): Rio de Janeiro, Brazil 2007; .
[25] Farabet, C.; Couprie, C.; Najman, L.; Lecun, Y.; Learning hierarchical features for scene labeling; IEEE Trans. Pattern Anal. Mach. Intell.: 2013; Volume 35 ,1915-1929.
[26] Koehn, P.; Hoang, H.; Birch, A.; Callison-Burch, C.; Federico, M.; Bertoldi, N.; Dyer, C.; Moses: Open source toolkit for statistical machine translation; Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions: Stroudsburg, PA, USA 2007; .
[27] Pal, M.; Random forest classifier for remote sensing classification; IJRS: 2005; Volume 26 ,217-222.
[28] Xu, B.; Guo, X.; Ye, Y.; Cheng, J.; An Improved Random Forest Classifier for Text Categorization; J. Comput.: 2012; Volume 7 ,2913-2920.
[29] Chan, J.C.W.; Paelinckx, D.; Evaluation of Random Forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery; Remote Sens. Environ.: 2008; Volume 112 ,2999-3011.
[30] Assunçao, F.; Lourenço, N.; Machado, P.; Ribeiro, B.; DENSER: Deep Evolutionary Network Structured Representation; arXiv: 2018; .
[31] Snoek, J.; Rippel, O.; Swersky, K.; Kiros, R.; Satish, N.; Sundaram, N.; Patwary, M.; Prabhat, M.; Adams, R.; Scalable bayesian optimization using deep neural networks; Proceedings of the 32nd International Conference on Machine Learning: Lille, France 2015; .
[32] Hinton, G.; Deng, L.; Yu, D.; Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups; IEEE Signal Process. Mag.: 2012; Volume 29 ,82-97.
[33] LeCun, Y.; Bengio, Y.; Hinton, G.; Deep learning; Nature: 2015; Volume 521 ,436-444.
[34] Krizhevsky, A.; Sutskever, I.; Hinton, G.E.; ImageNet classification with deep convolutional neural networks; Proceedings of the Advances in Neural Information Processing Systems: San Diego, CA, USA 2012; .
[35] Pighin, D.; Màrquez, L.; May, J.; An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output; Proceedings of the 8th International Conference on Language Resources and Evaluation: Istanbul, Turkey 2012; .
[36] Barrón-Cedeño, A.; Màrquez-Villodre, L.; Henríquez-Quintana, C.A.; Formiga-Fanals, L.; Romero-Merino, E.; May, J.; Identifying useful human correction feedback from an on-line machine translation service; Proceedings of the 23rd International Joint Conference on Artificial Intelligence: Beijing, China 2013; .
[37] Kordoni, V.; Birch, L.; Buliga, I.; Cholakov, K.; Egg, M.; Gaspari, F.; Georgakopoulou, Y.; Gialama, M.; Hendrickx, I.H.E.; Jermol, M.; TraMOOC (Translation for Massive Open Online Courses): Providing Reliable MT for MOOCs; Proceedings of the 19th annual conference of the European Association for Machine Translation (EAMT): Riga, Latvia 2016; .
[38] Sennrich, R.; Firat, O.; Cho, K.; Birch-Mayne, A.; Haddow, B.; Hitschler, J.; Junczys-Dowmunt, M.; Läubli, S.; Miceli Barone, A.; Mokry, J.; Nematus: A toolkit for neural machine translation; Proceedings of the EACL 2017 Software Demonstrations: Stroudsburg, PA, USA 2017; .
[39] Miceli-Barone, A.V.; Haddow, B.; Germann, U.; Sennrich, R.; Regularization techniques for ne-tuning in neural machine translation; Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Stroudsburg, PA, USA 2017; .
[40] Rama, T.; Borin, L.; Comparative evaluation of string similarity measures for automatic language classification; Sequences in Language and Text: Berlin, Germany 2015; Volume Volume 69 ,203-231.
[41] Broder, A.Z.; On the resemblance and containment of documents; Proceedings of the Compression and Complexity of Sequences 1997: Washington, DC, USA 1997; .
[42] Pouliquen, B.; Steinberger, R.; Ignat, C.; Automatic identification of document translations in large multilingual document collections; Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP): Borovets, Bulgaria 2003; .
[43] Deep Learning for Java; ; .
[44] Singhal, S.; Jena, M.; A study on WEKA tool for data preprocessing, classification and clustering; IJITEE: 2013; Volume 2 ,250-253.
[45] Daskalaki, S.; Kopanas, I.; Avouris, N.; Evaluation of classifiers for an uneven class distribution problem; Appl. Artif. Intell.: 2006; Volume 20 ,381-417.
[46] Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P.; SMOTE: Synthetic minority over-sampling technique; J. Artif. Intell. Res.: 2002; Volume 16 ,321-357. · Zbl 0994.68128
[47] Kuhn, M.; Kjell, J.; ; Applied Predictive Modeling: New York, NY, USA 2013; ,600. · Zbl 1306.62014
[48] Zhang, D.; Lee, W.S.; Extracting key-substring-group features for text classification; Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: New York, NY, USA 2006; .
[49] Šilić, A.; Chauchat, J.H.; Bašić, B.D.; Morin, A.; N-grams and morphological normalization in text classification: A comparison on a croatian-english parallel corpus; Proceedings of the Portuguese Conference on Artificial Intelligence: Berlin, Germany 2007; .
[50] Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; Tensorflow: A system for large-scale machine learning; Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation: Savannah, GA, USA 2016; .
[51] Kovalev, V.; Kalinovsky, A.; Kovalev, S.; ; Deep Learning with Theano, Torch, Caffe, Tensorflow, and Deeplearning4j: Which One Is the Best in Speed and Accuracy?: Berlin, Germany 2016; ,181.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.