×

Fitness landscape analysis of automated machine learning search spaces. (English) Zbl 1484.68193

Paquete, Luís (ed.) et al., Evolutionary computation in combinatorial optimization. 20th European conference, EvoCOP 2020, held as part of EvoStar 2020, Seville, Spain, April 15–17, 2020. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 12102, 114-130 (2020).
Summary: The field of Automated Machine Learning (AutoML) has as its main goal to automate the process of creating complete Machine Learning (ML) pipelines to any dataset without requiring deep user expertise in ML. Several AutoML methods have been proposed so far, but there is not a single one that really stands out. Furthermore, there is a lack of studies on the characteristics of the fitness landscape of AutoML search spaces. Such analysis may help to understand the performance of different optimization methods for AutoML and how to improve them. This paper adapts classic fitness landscape analysis measures to the context of AutoML. This is a challenging task, as AutoML search spaces include discrete, continuous, categorical and conditional hyperparameters. We propose an ML pipeline representation, a neighborhood definition and a distance metric between pipelines, and use them in the evaluation of the fitness distance correlation (FDC) and the neutrality ratio for a given AutoML search space. Results of FDC are counter-intuitive and require a more in-depth analysis of a range of search spaces. Results of neutrality, in turn, show a strong positive correlation between the mean neutrality ratio and the fitness value.
For the entire collection see [Zbl 1475.68024].

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] van Aardt, W.A., Bosman, A.S., Malan, K.M.: Characterising neutrality in neural network error landscapes. In: Proceedings of the Congress on Evolutionary Computation, pp. 1374-1381. IEEE (2017)
[2] Asuncion, A., Newman, D.: UCI machine learning repository (2007)
[3] Bosman, A.S., Engelbrecht, A.P., Helbig, M.: Progressive gradient walk for neural network fitness landscape analysis. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1473-1480. ACM (2018)
[4] Bosman, A.S., Engelbrecht, A., Helbig, M.: Search space boundaries in neural network error landscape analysis. In: Proceedings of the Symposium Series on Computational Intelligence, pp. 1-8. IEEE (2016)
[5] Ekárt, A.; Németh, SZ; Poli, R.; Banzhaf, W.; Langdon, WB; Miller, J.; Nordin, P.; Fogarty, TC, A metric for genetic programs and fitness sharing, Genetic Programming, 259-270 (2000), Heidelberg: Springer, Heidelberg
[6] Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 2962-2970 (2015)
[7] Garciarena, U., Santana, R., Mendiburu, A.: Analysis of the complexity of the automatic pipeline generation problem. In: Proceedings of the Congress on Evolutionary Computation, pp. 1-8. IEEE (2018)
[8] Hutter, F.; Kotthoff, L.; Vanschoren, J., Automated Machine Learning (2019), Cham: Springer, Cham
[9] Jones, T., Forrest, S.: Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Proceedings of the 6th International Conference on Genetic Algorithms, pp. 184-192 (1995)
[10] Malan, KM; Engelbrecht, AP, Characterising the searchability of continuous optimisation problems for PSO, Swarm Intell., 8, 4, 275-302 (2014)
[11] Mckay, RI; Hoai, NX; Whigham, PA; Shan, Y.; O’Neill, M., Grammar-based genetic programming: a survey, Genet. Program Evolvable Mach., 11, 3-4, 365-396 (2010)
[12] Olson, RS; Moore, JH; Hutter, F.; Kotthoff, L.; Vanschoren, J., TPOT: a tree-based pipeline optimization tool for automating machine learning, Automated Machine Learning, 151-160 (2019), Cham: Springer, Cham
[13] Pedregosa, F., Scikit-learn: machine learning in python, JMLR, 12, Oct, 2825-2830 (2011) · Zbl 1280.68189
[14] Pitzer, E.; Affenzeller, M.; Klempous, R.; Suárez Araujo, CP, A comprehensive survey on fitness landscape analysis, Recent Advances in Intelligent Engineering Systems, 161-191 (2012), Heidelberg: Springer, Heidelberg
[15] Pushak, Y.; Hoos, H.; Auger, A.; Fonseca, CM; Lourenço, N.; Machado, P.; Paquete, L.; Whitley, D., Algorithm configuration landscapes:, Parallel Problem Solving from Nature - PPSN XV, 271-283 (2018), Cham: Springer, Cham
[16] Rakitianskaia, A., Bekker, E., Malan, K.M., Engelbrecht, A.: Analysis of error landscapes in multi-layered neural networks for classification. In: Proceedings of the 2016 IEEE Congress on Evolutionary Computation, pp. 5270-5277. IEEE (2016)
[17] Reidys, CM; Stadler, PF, Neutrality in fitness landscapes, Appl. Math. Comput., 117, 2-3, 321-350 (2001) · Zbl 1113.92314
[18] Sipser, M.: Introduction to the Theory of Computation. 3rd edn. Cengage Learning (2012) · Zbl 1169.68300
[19] Stadler, PF; Lässig, M.; Valleriani, A., Fitness landscapes, Biological Evolution and Statistical Physics, 183-204 (2002), Heidelberg: Springer, Heidelberg
[20] Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847-855. ACM (2013)
[21] Vanneschi, L.; Pirola, Y.; Mauri, G.; Tomassini, M.; Collard, P.; Verel, S., A study of the neutrality of boolean function landscapes in genetic programming, Theor. Comput. Sci., 425, 34-57 (2012) · Zbl 1237.68196
[22] Vanschoren, J.; Van Rijn, JN; Bischl, B.; Torgo, L., OpenML: networked science in machine learning, ACM SIGKDD Explor. Newsl., 15, 2, 49-60 (2014)
[23] Witten, IH; Frank, E.; Hall, MA; Pal, CJ, Data mining: practical machine learning tools and techniques (2016), Burlington: Morgan Kaufmann Publishers Inc., Burlington
[24] Zöller, M.A., Huber, M.F.: Survey on automated machine learning. arXiv preprint arXiv:1904.12054 (2019)
[25] Zwillinger, D., CRC standard mathematical tables and formulae (2002), London/Boca Raton: Chapman and Hall/CRC, London/Boca Raton · Zbl 1222.00012
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.