×

autoBOT: evolving neuro-symbolic representations for explainable low resource text classification. (English) Zbl 07432828

Summary: Learning from texts has been widely adopted throughout industry and science. While state-of-the-art neural language models have shown very promising results for text classification, they are expensive to (pre-)train, require large amounts of data and tuning of hundreds of millions or more parameters. This paper explores how automatically evolved text representations can serve as a basis for explainable, low-resource branch of models with competitive performance that are subject to automated hyperparameter tuning. We present autoBOT (automatic Bags-Of-Tokens), an autoML approach suitable for low resource learning scenarios, where both the hardware and the amount of data required for training are limited. The proposed approach consists of an evolutionary algorithm that jointly optimizes various sparse representations of a given text (including word, subword, POS tag, keyword-based, knowledge graph-based and relational features) and two types of document embeddings (non-sparse representations). The key idea of autoBOT is that, instead of evolving at the learner level, evolution is conducted at the representation level. The proposed method offers competitive classification performance on fourteen real-world classification tasks when compared against a competitive autoML approach that evolves ensemble models, as well as state-of-the-art neural language models such as BERT and RoBERTa. Moreover, the approach is explainable, as the importance of the parts of the input space is part of the final solution yielded by the proposed optimization procedure, offering potential for meta-transfer learning.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agarwal, B., Mittal, N. (2014) Text classification using machine learning methods - A survey. In: Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012 (pp. 701-709). Springer.
[2] Belinkov, Y.; Glass, J., Analysis methods in neural language processing: A survey, Transactions of the Association for Computational Linguistics, 7, 49-72 (2019) · doi:10.1162/tacl_a_00254
[3] Beyer, HG; Schwefel, HP; Wegener, I., How to analyse evolutionary algorithms, Theoretical Computer Science, 287, 1, 101-130 (2002) · Zbl 1061.90119 · doi:10.1016/S0304-3975(02)00137-8
[4] Bird, S.; Klein, E.; Loper, E., Natural language processing with Python: Analyzing text with the natural language toolkit (2009), California: O’Reilly Media Inc, California · Zbl 1187.68630
[5] Bougouin, A., Boudin, F., Daille, B. (2013) TopicRank: Graph-based topic ranking for keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 543-551). Asian Federation of Natural Language Processing, Nagoya, Japan.
[6] Campos, R.; Mangaravite, V.; Pasquali, A.; Jorge, AM; Nunes, C.; Jatowt, A.; Pasi, G.; Piwowarski, B.; Azzopardi, L.; Hanbury, A., A text feature based automatic keyword extraction method for single documents, Advances in Information Retrieval, 684-691 (2018), Germany: Springer, Germany · doi:10.1007/978-3-319-76941-7_63
[7] Chambers, LD, The Practical Handbook of Genetic Algorithms: Applications (2000), Florida: CRC Press, Florida · Zbl 0954.68122 · doi:10.1201/9781420035568
[8] Chang, CC; Lin, CJ, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2, 3, 1-27 (2011) · doi:10.1145/1961189.1961199
[9] Davis, L., Handbook of Genetic Algorithms (1991), London: Chapman & Hall, London
[10] De Rainville, F.M., Fortin, F.A., Gardner, M.A., Parizeau, M., Gagné, C. (2012) Deap: A python framework for evolutionary algorithms. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation (pp. 85-92).
[11] Deb, K.; Jain, H., An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints, IEEE transactions on evolutionary computation, 18, 4, 577-601 (2013) · doi:10.1109/TEVC.2013.2281535
[12] Demšar, J., Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research., 7, 1-30 (2006) · Zbl 1222.68184
[13] Denysiuk, R.; Gaspar-Cunha, A.; Delbem, AC, Neuroevolution for solving multiobjective knapsack problems, Expert Systems with Applications, 116, 65-77 (2019) · doi:10.1016/j.eswa.2018.09.004
[14] Devlin, J., Chang, M.W., Lee, K., Toutanova, K. (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171-4186) . Minneapolis, Minnesota : Association for Computational Linguistics.
[15] Dorronsoro, B., Pinel, F. (2017) Combining machine learning and genetic algorithms to solve the independent tasks scheduling problem. In: 2017 3rd IEEE International Conference on Cybernetics (CYBCONF) (pp. 1-8). IEEE.
[16] Dua, D., Graff, C. (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml.
[17] Eiben, A.E., Aarts, E.H., Van Hee, K.M. (1990) Global convergence of genetic algorithms: A Markov chain analysis. In: Proceedings of the International Conference on Parallel Problem Solving from Nature (pp. 3-12). Springer.
[18] El-Beltagy, SR; Rafea, A., KP-Miner: A keyphrase extraction system for English and Arabic documents, Information Systems, 34, 1, 132-144 (2009) · doi:10.1016/j.is.2008.05.002
[19] English, T.M. (1996) Evaluation of evolutionary and genetic optimizers: No free lunch. In: Evolutionary Programming (pp. 163-169).
[20] Fellbaum, C. (2012) WordNet. The Encyclopedia of Applied Linguistics.
[21] Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F. (2019) Auto-sklearn: Efficient and robust automated machine learning. In: textitAutomated Machine Learning (pp. 113-134). Springer.
[22] Friedman, J.; Hastie, T.; Tibshirani, R., The Elements of Statistical Learning (2001), New York, USA: Springer Series, New York, USA · Zbl 0973.62007
[23] Gijsbers, P.; Vanschoren, J., Gama: Genetic automated machine learning assistant, Journal of Open Source Software, 4, 33, 1132 (2019) · doi:10.21105/joss.01132
[24] Greene, D., Cunningham, P. (2006) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: W.W. Cohen, A.W. Moore (eds.) Machine Learning, Proceedings of the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006, ACM International Conference Proceeding Series (pp. 377-384). ACM.
[25] Hajj, N.; Rizk, Y.; Awad, M., A subjectivity classification framework for sports articles using improved cortical algorithms, Neural Computing and Applications, 31, 11, 8069-8085 (2019) · doi:10.1007/s00521-018-3549-3
[26] He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J., Han, S. (2018) Amc: Automl for model compression and acceleration on mobile devices. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 784-800).
[27] Ishibuchi, H., Tsukamoto, N., Nojima, Y. (2008) Evolutionary many-objective optimization: A short review. In: Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence) (pp. 2419-2426). IEEE.
[28] Jennings, PC; Lysgaard, S.; Hummelshøj, JS; Vegge, T.; Bligaard, T., Genetic algorithms for computational materials discovery accelerated by machine learning, NPJ Computational Materials, 5, 1, 1-6 (2019) · doi:10.1038/s41524-019-0181-4
[29] Jing, K., Xu, J. (2019) A survey on neural network language models. arXiv preprint arXiv:1906.03591
[30] Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., et al. (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture (pp. 1-12).
[31] Khosrovian, K., Pfahl, D., Garousi, V. (2008) Gensim 2.0: A customizable process simulation model for software process evaluation. In: Proceedings of the International Conference on Software Process (pp. 294-306). Springer.
[32] Kipf, T.N., Welling, M. (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
[33] Komer, B., Bergstra, J., Eliasmith, C. (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML workshop on AutoML (p. 50). Citeseer.
[34] Kotthoff, L.; Thornton, C.; Hoos, HH; Hutter, F.; Leyton-Brown, K., Auto-WEKA 2.0?: Automatic model selection and hyperparameter optimization in WEKA, Journal of Machine Learning Research, 18, 25, 1-5 (2017)
[35] Kowsari, K.; Jafari Meimandi, K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D., Text classification algorithms: A survey, Information, 10, 4, 150 (2019) · doi:10.3390/info10040150
[36] Lavrač, N.; Škrlj, B.; Robnik-Šikonja, M., Propositionalization and embeddings: two sides of the same coin, Machine Learning, 109, 7, 1465-1507 (2020) · Zbl 1522.68469 · doi:10.1007/s10994-020-05890-8
[37] Le, Q.V., Mikolov, T. (2014) Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, JMLR Workshop and Conference Proceedings vol. 32 (pp. 1188-1196). JMLR.org.
[38] Li, X., Roth, D. (2002) Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), vol. 1 (pp. 1-7).
[39] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V. (2019) RoBERTa: A robustly optimized BERT pretraining approach.
[40] Madrid, J. (2019) Autotext: AutoML for text classification. https://inaoe.repositorioinstitucional.mx/jspui/bitstream/1009/1950/1/MadridPJG.pdf
[41] Manning, CD; Raghavan, P.; Schütze, H., Scoring, term weighting and the vector space model, Introduction to information retrieval, 100, 2-4 (2008)
[42] Martinc, M., Škrjanec, I., Zupan, K., Pollak, S. (2017) Pan 2017 Author profiling - gender and language variety prediction. In: Working Notes Papers of the CLEF.
[43] Mihalcea, R., Tarau, P. (2004) TextRank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (pp. 404-411). Barcelona, Spain: Association for Computational Linguistics.
[44] Mirończuk, MM; Protasiewicz, J., A recent overview of the state-of-the-art elements of text classification, Expert Systems with Applications, 106, 36-54 (2018) · doi:10.1016/j.eswa.2018.03.058
[45] Misra, R., Arora, P. (2019) Sarcasm detection using hybrid neural network.
[46] Mitchell, M., An Introduction to Genetic Algorithms (1998), Cambridge, MA, USA: MIT Press, Cambridge, MA, USA · Zbl 0906.68113 · doi:10.7551/mitpress/3927.001.0001
[47] Mohr, F.; Wever, M.; Hüllermeier, E., Ml-plan: Automated machine learning via hierarchical planning, Machine Learning, 107, 8, 1495-1515 (2018) · Zbl 1473.68157 · doi:10.1007/s10994-018-5735-z
[48] Moradi, M.; Dorffner, G.; Samwald, M., Deep contextualized embeddings for quantifying the informative content in biomedical text summarization, Computer Methods and Programs in Biomedicine, 184, 105117 (2020) · doi:10.1016/j.cmpb.2019.105117
[49] Myers, IB, The Myers-Briggs Type Indicator: Manual (1962), Germany: Consulting Psychologists Press, Germany · doi:10.1037/14404-000
[50] Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T. (2013). SemEval-2013 task 2: Sentiment analysis in Twitter. Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval, Second Joint Conference on Lexical and Computational Semantics (*SEM), 312-320 (2013), Atlanta, Georgia, USA: Association for Computational Linguistics, Atlanta, Georgia, USA
[51] Olson, R.S., Moore, J.H. (2019) Tpot: A tree-based pipeline optimization tool for automating machine learning. In: Automated Machine Learning (pp. 151-160). Springer.
[52] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830 (2011) · Zbl 1280.68189
[53] Pilat, M., Křen, T., Neruda, R. (2016) Asynchronous evolution of data mining workflow schemes by strongly typed genetic programming. In: 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 577-584). IEEE.
[54] Pollak, S.; Coesemans, R.; Daelemans, W.; Lavrač, N., Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining, Pragmatics, Quarterly Publication of the International Pragmatics Association (IPrA)., 21, 4, 647-683 (2011) · doi:10.1075/prag.21.4.07pol
[55] Qian, M., Zhai, C. (2014) Unsupervised feature selection for multi-view clustering on text-image web news data. In: J. Li, X.S. Wang, M.N. Garofalakis, I. Soboroff, T. Suel, M. Wang (eds.) Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management CIKM (pp. 1963-1966). Shanghai, China :ACM.
[56] Rappl, G., On linear convergence of a class of random search algorithms, ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, 69, 1, 37-45 (1989) · Zbl 0686.65042 · doi:10.1002/zamm.19890690119
[57] Reif, M.; Shafait, F.; Dengel, A., Meta-learning for evolutionary parameter optimization of classifiers, Machine Learning, 87, 3, 357-380 (2012) · doi:10.1007/s10994-012-5286-7
[58] Rose, S.; Engel, D.; Cramer, N.; Cowley, W., Automatic keyword extraction from individual documents, 1-20 (2010), New Jersey: Wiley Online Library, New Jersey
[59] Rudin, C., Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nature Machine Intelligence, 1, 5, 206-215 (2019) · doi:10.1038/s42256-019-0048-x
[60] Sennrich, R., Haddow, B., Birch, A. (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1715-1725). Berlin, Germany : Association for Computational Linguistics.
[61] Škrlj, B., Repar, A., Pollak, S. (2019) RaKUn: Rank-based keyword extraction via unsupervised learning and meta vertex aggregation. In: International Conference on Statistical Language and Speech Processing (pp. 311-323) Springer.
[62] Snoek, J., Larochelle, H., Adams, R.P. (2012) Practical bayesian optimization of machine learning algorithms. In: P.L. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012 (pp. 2960-2968), Lake Tahoe, Nevada, United States.
[63] Speer, R.; Chin, J.; Havasi, C.; Singh, SP; Markovitch, S., Conceptnet 5.5: An open multilingual graph of general knowledge, Proceeding of the Thirty-First AAAI Conference on Artificial Intelligence, 4441-4451 (2017), San Fransisco, California, USA: AAAI Press, San Fransisco, California, USA
[64] Stanley, KO; Clune, J.; Lehman, J.; Miikkulainen, R., Designing neural networks through neuroevolution, Nature Machine Intelligence, 1, 1, 24-35 (2019) · doi:10.1038/s42256-018-0006-z
[65] Sterckx, L., Demeester, T., Deleu, J., Develder, C. (2015) Topical word importance for fast keyphrase extraction. In: Proceedings of the 24th International Conference on World Wide Web (pp. 121-122). New York: ACM.
[66] Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, AA; Singh, SP; Markovitch, S., Inception-v4, inception-resnet and the impact of residual connections on learning, Proc of the Thirty-First AAAI Conference on Artificial Intelligence, 4278-4284 (2017), San Francisco, California, USA: AAAI Press, San Francisco, California, USA
[67] Thornton, C.; Hutter, F.; Hoos, HH; Leyton-Brown, K.; Dhillon, IS; Koren, Y.; Ghani, R.; Senator, TE; Bradley, P.; Parekh, R.; He, J.; Grossman, RL; Uthurusamy, R., Auto-weka: combined selection and hyperparameter optimization of classification algorithms, The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2013, 847-855 (2013), Chicago, IL, USA: ACM, Chicago, IL, USA · doi:10.1145/2487575.2487629
[68] Vafaie, H.; De Jong, K., Feature space transformation using genetic algorithms, IEEE Intelligent Systems and their Applications, 13, 2, 57-65 (1998) · doi:10.1109/5254.671093
[69] Virtanen, P.; Gommers, R.; Oliphant, TE; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J., Scipy 10 Fundamental algorithms for scientific computing in Python, Nature Methods, 17, 3, 261-272 (2020) · doi:10.1038/s41592-019-0686-2
[70] Wan, X.; Xiao, J., Single document keyphrase extraction using neighborhood knowledge, Proceedings of the AAAI Conference, 8, 855-860 (2008)
[71] Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., Rush, A. (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 38-45). Association for Computational Linguistics, Online. doi:10.18653/v1/2020.emnlp-demos.6. https://www.aclweb.org/anthology/2020.emnlp-demos.6.
[72] Wolpert, DH; Macready, WG, No free lunch theorems for optimization, IEEE Transactions on Evolutionary Computation, 1, 1, 67-82 (1997) · doi:10.1109/4235.585893
[73] Yang, C., Akimoto, Y., Kim, D.W., Udell, M. (2019) Oboe. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
[74] Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V. (2019) Xlnet: Generalized autoregressive pretraining for language understanding. In: H.M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E.B. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019(pp. 5754-5764) Vancouver, BC, Canada : NeurIPS 2019.
[75] Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R. (2019) Predicting the type and target of offensive posts in social media. In: textitProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 1415-1420). Linguistics, Minneapolis, Minnesota : Association for Computational.
[76] Zimmer, M.; Doncieux, S., Bootstrapping \(q \)-learning for robotics from neuro-evolution results, IEEE Transactions on Cognitive and Developmental Systems, 10, 1, 102-119 (2017) · doi:10.1109/TCDS.2016.2628817
[77] Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V. (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition CVPR 2018 (pp. 8697-8710). Salt Lake City, UT, USA: IEEE Computer Society.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.