Hyperparameter optimization in learning systems. (English) Zbl 1469.68020

Summary: While the training parameters of machine learning models are adapted during the training phase, the values of the hyperparameters (or meta-parameters) have to be specified before the learning phase. The goal is to find a set of hyperparameter values which gives us the best model for our data in a reasonable amount of time. We present an integrated view of methods used in hyperparameter optimization of learning systems, with an emphasis on computational complexity aspects. Our thesis is that we should solve a hyperparameter optimization problem using a combination of techniques for: optimization, search space and training time reduction. Case studies from real-world applications illustrate the practical aspects. We create the framework for a future separation between parameters and hyperparameters in adaptive P systems.


68Q07 Biologically inspired models of computation (DNA computing, membrane computing, etc.)
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI


[1] Albelwi, S., & Mahmood, A. (2016). Analysis of instance selection algorithms on large datasets with deep convolutional neural networks. In 2016 IEEE Long island systems, applications and technology conference (LISAT) (pp. 1-5).
[2] Albelwi, S., & Mahmood, A. (2016). Automated optimal architecture of deep convolutional neural networks for image recognition. In 2016 15th IEEE International conference on machine learning and applications (ICMLA) (pp. 53-60). 10.1109/ICMLA.2016.0018.
[3] Aman, B.; Ciobanu, G., Adaptive P systems, Lecture Notes in Computer Science, 11399, 57-72 (2019) · Zbl 07115215
[4] Andonie, R.; Kárný, M.; Warwick, K.; Kůrková, V., The psychologiocal limits of neural computation, Dealing with complexity: A neural networks approach, 252-263 (1998), London: Springer, London
[5] Andonie, R.; Fabry-Asztalos, L.; Abdul-Wahid, Cb; Abdul-Wahid, S.; Barker, Gi; Magill, Lc, Fuzzy ARTMAP prediction of biological activities for potential HIV-1 protease inhibitors using a small molecular data set, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8, 1, 80-93 (2011)
[6] Andonie, R., Fabry-Asztalos, L., Magill, L., & Abdul-Wahid, S. (2007). A new fuzzy ARTMAP approach for predicting biological activity of potential HIV-1 protease inhibitors. In 2007 IEEE International conference on bioinformatics and biomedicine (BIBM 2007) (pp. 56-61). 10.1109/BIBM.2007.9.
[7] Andonie, R.; Sasu, L., Fuzzy ARTMAP with input relevances, IEEE Transactions on Neural Networks, 17, 4, 929-941 (2006)
[8] Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, ICML’09 (pp. 41-48). New York, NY, USA: ACM. 10.1145/1553374.1553380.
[9] Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In 25th Annual conference on neural information processing systems (NIPS 2011), advances in neural information processing systems (Vol. 24). Granada, Spain: Neural Information Processing Systems Foundation.
[10] Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, & K. Q. Weinberger (Eds.), NIPS (pp. 2546-2554). http://dblp.uni-trier.de/db/conf/nips/nips2011.html.
[11] Bergstra, J.; Bengio, Y., Random search for hyper-parameter optimization, Journal of Machine Learning Research, 13, 281-305 (2012) · Zbl 1283.68282
[12] Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., & Cox, D. D. (2015). Hyperopt: A Python library for model selection and hyperparameter optimization. Computational Science and Discovery, 8(1), 014008. http://stacks.iop.org/1749-4699/8/i=1/a=014008.
[13] Cabarle, Fgc; Adorna, Hn; Pérez-Jiménez, Mj; Song, T., Spiking neural P systems with structural plasticity, Neural Computing and Applications, 26, 8, 1905-1917 (2015)
[14] Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27. Software Retrieved from http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[15] Cortes, C.; Vapnik, V., Support-vector networks, Machine Learning, 20, 3, 273-297 (1995) · Zbl 0831.68098
[16] Dagher, I., Georgiopoulos, M., Heileman, G. L., & Bebis, G. (1998). Ordered fuzzy ARTMAP: A fuzzy ARTMAP algorithm with a fixed order of pattern presentation. In 1998 IEEE International joint conference on neural networks proceedings. IEEE world congress on computational intelligence (Cat. No. 98CH36227) (Vol. 3, pp. 1717-1722). 10.1109/IJCNN.1998.687115.
[17] Domhan, T., Springenberg, J. T., & Hutter, F. (2015). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Proceedings of the 24th international conference on artificial intelligence, IJCAI’15 (pp. 3460-3468). AAAI Press. http://dl.acm.org/citation.cfm?id=2832581.2832731.
[18] Engelbrecht, A. P., Selective Learning for Multilayer Feedforward Neural Networks, Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence, 386-393 (2001), Berlin, Heidelberg: Springer Berlin Heidelberg, Berlin, Heidelberg · Zbl 0982.68757
[19] Feurer, M., Klein, A., Eggensperger, K., Springenberg, J. T., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Proceedings of the 28th international conference on neural information processing systems—Volume 2, NIPS’15 (pp. 2755-2763). Cambridge, MA, USA: MIT Press. http://dl.acm.org/citation.cfm?id=2969442.2969547.
[20] Florea, Ac; Andonie, R.; Iliadis, L.; Maglogiannis, I.; Plagianakos, V., A dynamic early stopping criterion for random search in SVM hyperparameter optimization, Artificial intelligence applications and innovations, 168-180 (2018), Cham: Springer International Publishing, Cham
[21] Florea, Ac; Andonie, R., Weighted random search for hyperparameter optimization, International Journal of Computers Communications & Control, 14, 2, 154-169 (2019)
[22] Grefenstette, Jj, Optimization of control parameters for genetic algorithms, IEEE Transactions on Systems, Man, and Cybernetics, 16, 1, 122-128 (1986)
[23] Gutiérrez-Naranjo, Ma; Pérez-Jiménez, Mj; Corne, Dw; Frisco, P.; Păun, G.; Rozenberg, G.; Salomaa, A., Hebbian learning from spiking neural P systems view, Membrane computing, 217-230 (2009), Berlin: Springer, Berlin · Zbl 1196.68190
[24] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, Journal of Machine Learning Research, 3, 1157-1182 (2003) · Zbl 1102.68556
[25] Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, Ih, The WEKA data mining software: An update, ACM SIGKDD Explorations Newsletter, 11, 1, 10-18 (2009)
[26] Hutter, F., Hoos, H., & Leyton-Brown, K. (2014). An efficient approach for assessing hyperparameter importance. In Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21-26 June 2014 (pp. 754-762).
[27] Ionescu, M.; Păun, G.; Yokomori, T., Spiking neural P systems, Fundamenta Informaticae, 71, 279-308 (2006) · Zbl 1110.68043
[28] Kabkab, M., Alavi, A., & Chellappa, R. (2016). DCNNs on a diet: Sampling strategies for reducing the training set size. CoRR abs/1606.04232. arxiv:1606.04232.
[29] Kewley, Rh; Embrechts, Mj; Breneman, C., Data strip mining for the virtual design of pharmaceuticals with neural networks, IEEE Transactions on Neural Networks, 11, 3, 668-679 (2000)
[30] Kotthoff, L., Thornton, C., Hoos, H. H., Hutter, F., & Leyton-Brown, K. (2017). Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research, 18(25), 1-5. http://jmlr.org/papers/v18/16-261.html. · Zbl 06781346
[31] Lemley, J., Jagodzinski, F., & Andonie, R. (2016). Big holes in big data: A Monte Carlo algorithm for detecting large hyper-rectangles in high dimensional data. In 2016 IEEE 40th annual computer software and applications conference (COMPSAC) (Vol. 1, pp. 563-571). 10.1109/COMPSAC.2016.73.
[32] Li, L., Jamieson, K. G., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2016). Efficient hyperparameter optimization and infinitely many armed bandits. CoRR abs/1603.06560. arxiv:1603.06560. · Zbl 1468.68204
[33] Mcculloch, W.; Pitts, W., A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, 5, 115-133 (1943) · Zbl 0063.03860
[34] Nelder, Ja; Mead, R., A simplex method for function minimization, The Computer Journal, 7, 308-313 (1965) · Zbl 0229.65053
[35] Olvera-López, Ja; Carrasco-Ochoa, Ja; Martínez-Trinidad, Jf; Kittler, J., A review of instance selection methods, Artificial Intelligence Review, 34, 2, 133-143 (2010)
[36] Păun, G., Computing with membranes, Journal of Computer and System Sciences, 61, 1, 108-143 (2000) · Zbl 0956.68055
[37] Păun, G.; Rozenberg, G.; Salomaa, A., The Oxford handbook of membrane computing (2010), Oxford: Oxford University Press, Oxford · Zbl 1237.68001
[38] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830 (2011) · Zbl 1280.68189
[39] Plutowski, M.; White, H., Selecting concise training sets from clean data, IEEE Transactions on Neural Networks, 4, 2, 305-318 (1993)
[40] Provost, F., Jensen, D., & Oates, T. (1999). Efficient progressive sampling. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining KDD’99 (pp. 23-32). New York, NY, USA: ACM.
[41] Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. CoRR abs/1811.12808. arxiv:1811.12808.
[42] Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2018). Regularized evolution for image classifier architecture search. CoRR abs/1802.01548. arxiv:1802.01548.
[43] Refaeilzadeh, P., Tang, L., & Liu, H. (2007). On comparison of feature selection algorithms. In AAAI Workshop—technical report (Vol. WS-07-05, pp. 34-39).
[44] Rong, H.; Wu, T.; Pan, L.; Zhang, G.; Graciani, C.; Riscos-Núñez, A.; Păun, G.; Rozenberg, G.; Salomaa, A., Spiking neural P systems: Theoretical results and applications, Enjoying natural computing: Essays dedicated to Mario de Jesús Pérez-Jiménez on the occasion of his 70th birthday, 256-268 (2018), Cham: Springer International Publishing, Cham
[45] Shahriari, B.; Swersky, K.; Wang, Z.; Adams, Rp; De Freitas, N., Taking the human out of the loop: A review of Bayesian optimization, Proceedings of the IEEE, 104, 1, 148-175 (2016)
[46] Siegelmann, H. T., & Sontag, E. D. (1992). On the computational power of neural nets. In Proceedings of the fifth annual workshop on computational learning theory, COLT’92 (pp. 440-449). New York, NY, USA: ACM. 10.1145/130385.130432.
[47] Skalak, D. B. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Machine learning: proceedings of the eleventh international conference (pp. 293-301). Morgan Kaufmann.
[48] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States (pp. 2960-2968).
[49] Song, T.; Pan, L.; Wu, T.; Zheng, P.; Wong, Mld; Rodríguez-Patón, A., Spiking neural P systems with learning functions, IEEE Transactions on Nanobioscience, 18, 2, 176-190 (2019)
[50] Sun, Y.; Gong, H.; Li, Y.; Zhang, D., Hyperparameter importance analysis based on N-RReliefF algorithm, International Journal of Computers Communications & Control, 14, 4, 557-573 (2019)
[51] Sunkad, Z. A., & Soujanya (2016). Feature selection and hyperparameter optimization of svm for human activity recognition. In 2016 3rd International conference on soft computing machine intelligence (ISCMI) (pp. 104-109). 10.1109/ISCMI.2016.30.
[52] Von Neumann, J., First draft of a report on the EDVAC, IEEE Annals of the History of Computing, 15, 4, 27-75 (1993) · Zbl 0944.01510
[53] Wang, J.; Peng, H., Adaptive fuzzy spiking neural P systems for fuzzy inference and learning, International Journal of Computer Mathematics, 90, 4, 857-868 (2013) · Zbl 1286.68147
[54] Wang, Jj; Hoogeboom, Hj; Pan, L.; Păun, G.; Pérez-Jiménez, Mj, Spiking neural P systems with weights, Neural Computation, 22, 2615-2646 (2010) · Zbl 1208.68120
[55] Wang, X.; Song, T.; Gong, F.; Zheng, P., On the computational power of spiking neural P systems with self-organization, Scientific Reports, 6, 27624 (2016)
[56] Westbrook, J.; Berman, Hm; Feng, Z.; Gilliland, G.; Bhat, Tn; Weissig, H.; Shindyalov, In; Bourne, Pe, The protein data bank, Nucleic Acids Research, 28, 235-242 (2000)
[57] Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., & Keutzer, K. (2018). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. CoRR abs/1812.03443. arxiv:1812.03443.
[58] Zoph, B., & Le, Q. V. (2016). Neural architecture search with reinforcement learning. CoRR abs/1611.01578. arxiv:1611.01578.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.