Hyperparameter optimization via sequential uniform designs. (English) Zbl 07415092

Summary: Hyperparameter optimization (HPO) plays a central role in the automated machine learning (AutoML). It is a challenging task as the response surfaces of hyperparameters are generally unknown, hence essentially a global optimization problem. This paper reformulates HPO as a computer experiment and proposes a novel sequential uniform design (SeqUD) strategy with three-fold advantages: a) the hyperparameter space is adaptively explored with evenly spread design points, without the need of expensive meta-modeling and acquisition optimization; b) the batch-by-batch design points are sequentially generated with parallel processing support; c) a new augmented uniform design algorithm is developed for the efficient real-time generation of follow-up design points. Extensive experiments are conducted on both global optimization tasks and HPO applications. The numerical results show that the proposed SeqUD strategy outperforms benchmark HPO methods, and it can be therefore a promising and competitive alternative to existing AutoML tools.


68T05 Learning and adaptive systems in artificial intelligence
Full Text: arXiv Link


[1] R´emi Bardenet, M´aty´as Brendel, Bal´azs K´egl, and Michele Sebag. Collaborative hyperparameter tuning. InInternational Conference on Machine Learning, pages 199-207, 2013.
[2] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(10):281-305, 2012. · Zbl 1283.68282
[3] James Bergstra, Brent Komer, Chris Eliasmith, Dan Yamins, and David D Cox. Hyperopt: a python library for model selection and hyperparameter optimization.Computational Science & Discovery, 8(1):014008, 2015.
[4] James S Bergstra, R´emi Bardenet, Yoshua Bengio, and Bal´azs K´egl. Algorithms for hyperparameter optimization. InAdvances in Neural Information Processing Systems, pages 2546-2554, 2011.
[5] Felix Berkenkamp, Angela P Schoellig, and Andreas Krause. No-regret bayesian optimization with unknown hyperparameters.Journal of Machine Learning Research, 20(50): 1-24, 2019. · Zbl 1484.68179
[6] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011.
[7] Karel Crombecq, Dirk Gorissen, Dirk Deschrijver, and Tom Dhaene. A novel hybrid sequential design strategy for global surrogate modeling of computer experiments.SIAM Journal on Scientific Computing, 33(4):1948-1974, 2011. · Zbl 1227.62059
[8] Thanh Dai Nguyen, Sunil Gupta, Santu Rana, and Svetha Venkatesh. Stable bayesian optimization. InPacific-Asia Conference on Knowledge Discovery and Data Mining, pages 578-591. Springer, 2017.
[9] Chiara Di Francescomarino, Marlon Dumas, Marco Federici, Chiara Ghidini, Fabrizio Maria Maggi, Williams Rizzi, and Luca Simonetto. Genetic algorithms for hyperparameter optimization in predictive business process monitoring.Information Systems, 74:67-83, 2018.
[10] Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. InInternational Joint Conference on Artificial Intelligence, pages 3460-3468, 2015.
[11] Delphine Dupuy, C´eline Helbert, and Jessica Franco.Dicedesign and diceeval: Two r packages for design and analysis of computer experiments.Journal of Statistical Software, 65(11):1-38, 2015.
[12] Hugo Jair Escalante, Manuel Montes, and Luis Enrique Sucar. Particle swarm model selection.Journal of Machine Learning Research, 10(15):405-440, 2009.
[13] Kai-Tai Fang and Yuan Wang. A sequential algorithm for optimization and its applications to regression analysis.Lecture Notes in Contemporary Mathematics, pages 17-28, 1990.
[14] Kai-Tai Fang and Yuan Wang.Number-Theoretic Methods in Statistics, volume 51. Chapman & Hall, London., 1994. · Zbl 0925.65263
[15] Kai-Tai Fang, Dennis KJ Lin, Peter Winker, and Yong Zhang. Uniform design: theory and application.Technometrics, 42(3):237-248, 2000. · Zbl 0996.62073
[16] Kai-Tai Fang, Runze Li, and Agus Sudjianto.Design and Modeling for Computer Experiments. Chapman and Hall/CRC, 2006. · Zbl 1093.62117
[17] Kai-Tai Fang, Min-Qian Liu, Hong Qin, and Yong-Dao Zhou.Theory and Application of Uniform Experimental Designs, volume 221 ofLecture Notes in Statistics. Springer, 2018. · Zbl 1407.62015
[18] Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. Efficient and robust automated machine learning. InAdvances in Neural Information Processing Systems, pages 2962-2970, 2015.
[19] Eduardo C Garrido-Merch´an and Daniel Hern´andez-Lobato. Dealing with categorical and integer-valued variables in bayesian optimization with gaussian processes.Neurocomputing, 380:20-35, 2020.
[20] Heikki Haario, Eero Saksman, and Johanna Tamminen. Adaptive proposal distribution for random walk metropolis algorithm.Computational Statistics, 14(3):375-396, 1999. · Zbl 0941.62036
[21] Heikki Haario, Eero Saksman, and Johanna Tamminen. An adaptive metropolis algorithm. Bernoulli, 7(2):223-242, 2001. · Zbl 0989.65004
[22] Fred Hickernell. A generalized discrepancy and quadrature error bound.Mathematics of Computation, 67(221):299-322, 1998. · Zbl 0889.41025
[23] Chien-Ming Huang, Yuh-Jye Lee, Dennis KJ Lin, and Su-Yun Huang. Model selection for support vector machines via uniform design.Computational Statistics & Data Analysis, 52(1):335-346, 2007. · Zbl 1452.62073
[24] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. InInternational Conference on Learning and Intelligent Optimization, pages 507-523. Springer, 2011.
[25] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Parallel algorithm configuration. InInternational Conference on Learning and Intelligent Optimization, pages 55-70. Springer, 2012.
[26] Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren.Automated Machine Learning: Methods, Systems, Challenges. Springer Nature, 2019.
[27] Ruichen Jin, Wei Chen, and Agus Sudjianto. An efficient algorithm for constructing optimal design of computer experiments.Journal of Statistical Planning and Inference, 134(1): 268-287, 2005. · Zbl 1066.62075
[28] Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions.Journal of Global Optimization, 13(4):455-492, 1998. · Zbl 0917.90270
[29] Kirthevasan Kandasamy, Jeff Schneider, and Barnab´as P´oczos. High dimensional bayesian optimisation and bandits via additive models. InInternational Conference on Machine Learning, pages 295-304, 2015.
[30] Q Ye Kenny, William Li, and Agus Sudjianto. Algorithmic construction of optimal symmetric latin hypercube designs.Journal of Statistical Planning and Inference, 90(1):145-159, 2000. · Zbl 1109.62329
[31] Brent Komer, James Bergstra, and Chris Eliasmith. Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. InInternational Conference on Machine Learning Workshop on AutoML, volume 9, page 50, 2014.
[32] Lars Kotthoff, Chris Thornton, Holger H Hoos, Frank Hutter, and Kevin Leyton-Brown. Auto-WEKA 2.0:automatic model selection and hyperparameter optimization in WEKA.Journal of Machine Learning Research, 18(1):826-830, 2017.
[33] Julien-Charles L´evesque.Bayesian Hyperparameter Optimization: Overfitting, Ensembles and Conditional Spaces. PhD dissertation, Universit´e Laval, 2018.
[34] Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization.Journal of Machine Learning Research, 18(1):6765-6816, 2017. · Zbl 1468.68204
[35] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning.arXiv preprint: 1509.02971, 2015.
[36] Michinari Momma and Kristin P Bennett. A pattern search method for model selection of support vector regression. InProceedings of the SIAM International Conference on Data Mining, pages 261-274. SIAM, 2002. · Zbl 1274.68342
[37] Harald Niederreiter.Random Number Generation and Quasi-Monte Carlo Methods. SIAM, 1992. · Zbl 0761.65002
[38] Philipp Probst, Anne-Laure Boulesteix, and Bernd Bischl. Tunability: Importance of hyperparameters of machine learning algorithms.Journal of Machine Learning Research, 20(53):1-32, 2019. · Zbl 1485.68226
[39] Robert J Renka and Ron Brown. Algorithm 792: accuracy test of acm algorithms for interpolation of scattered data in the plane.ACM Transactions on Mathematical Software, 25(1):78-94, 1999. · Zbl 0963.65014
[40] Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of bayesian optimization.Proceedings of the IEEE, 104(1):148-175, 2016.
[41] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. InAdvances in Neural Information Processing Systems, pages 2951-2959, 2012. · Zbl 1433.68379
[42] Ilya M Sobol. On quasi-monte carlo integrations.Mathematics and Computers in Simulation, 47(2-5):103-112, 1998.
[43] S. Surjanovic and D. Bingham. Virtual library of simulation experiments: Test functions and datasets. Retrieved August 9, 2020, fromhttp://www.sfu.ca/ ssurjano, 2020.
[44] Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task bayesian optimization. In Advances in Neural Information Processing Systems, pages 2004-2012, 2013.
[45] Ziyu Wang, Masrour Zoghi, Frank Hutter, David Matheson, and Nando De Freitas. Bayesian optimization in high dimensions via random embeddings. InInternational Joint Conference on Artificial Intelligence, pages 1778-1784, 2013.
[46] Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning.arXiv preprint: 1611.01578, 2016.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.