zbMATH — the first resource for mathematics

ADMMBO: Bayesian optimization with unknown constraints using ADMM. (English) Zbl 1441.62075
Summary: There exist many problems in science and engineering that involve optimization of an unknown or partially unknown objective function. Recently, Bayesian Optimization (BO) has emerged as a powerful tool for solving optimization problems whose objective functions are only available as a black box and are expensive to evaluate. Many practical problems, however, involve optimization of an unknown objective function subject to unknown constraints. This is an important yet challenging problem for which, unlike optimizing an unknown function, existing methods face several limitations. In this paper, we present a novel constrained Bayesian optimization framework to optimize an unknown objective function subject to unknown constraints. We introduce an equivalent optimization by augmenting the objective function with constraints, introducing auxiliary variables for each constraint, and forcing the new variables to be equal to the main variable. Building on the Alternating Direction Method of Multipliers (ADMM) algorithm, we propose ADMM-Bayesian Optimization (ADMMBO) to solve the problem in an iterative fashion. Our framework leads to multiple unconstrained subproblems with unknown objective functions, which we then solve via BO. Our method resolves several challenges of state-of-the-art techniques: it can start from infeasible points, is insensitive to initialization, can efficiently handle “decoupled problems” and has a concrete stopping criterion. Extensive experiments on a number of challenging BO benchmark problems show that our proposed approach outperforms the state-of-the-art methods in terms of the speed of obtaining a feasible solution and convergence to the global optimum as well as minimizing the number of total evaluations of unknown objective and constraints functions.
62F15 Bayesian inference
62D10 Missing data
90C31 Sensitivity, stability, parametric optimization
60G15 Gaussian processes
Full Text: Link
[1] Mart´ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for largescale machine learning. InOSDI, volume 16, pages 265-283, 2016.
[2] Javad Azimi, Ali Jalali, and Xiaoli Fern.Hybrid batch bayesian optimization.arXiv preprint arXiv:1202.5597, 2012.
[3] Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Enhanced higgs boson toτ+τ- search with deep learning.Physical review letters, 114(11):111801, 2015.
[4] James S Bergstra, R´emi Bardenet, Yoshua Bengio, and Bal´azs K´egl. Algorithms for hyper-parameter optimization. InAdvances in neural information processing systems, pages 2546-2554, 2011.
[5] JM Bernardo, MJ Bayarri, JO Berger, AP Dawid, D Heckerman, AFM Smith, and M West. Optimization under unknown constraints.Bayesian Statistics 9, 9:229, 2011.
[6] L´eon Bottou. Large-scale machine learning with stochastic gradient descent. InProceedings of COMPSTAT’2010, pages 177-186. Springer, 2010.
[7] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and TrendsRin Machine Learning, 3(1):1-122, 2011.
[8] Eric Brochu, Tyson Brochu, and Nando de Freitas. A bayesian interactive optimization approach to procedural animation design.InProceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 103-112. Eurographics Association, 2010a.
[9] Eric Brochu, Vlad M Cora, and Nando De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010b.
[10] Fran¸cois Chollet et al. Keras, 2015.
[11] Dennis D Cox and Susan John. A statistical method for global optimization. InSystems, Man and Cybernetics, 1992., IEEE International Conference on, pages 1241-1246. IEEE, 1992.
[12] Daniel E Finkel. Direct optimization algorithm user guide.Center for Research in Scientific Computation, North Carolina State University, 2, 2003.
[13] Jacob R Gardner, Matt J Kusner, Zhixiang Eddie Xu, Kilian Q Weinberger, and John P Cunningham. Bayesian optimization with inequality constraints. InICML, pages 937-945, 2014.
[14] Michael A Gelbart, Jasper Snoek, and Ryan P Adams. Bayesian optimization with unknown constraints.arXiv preprint arXiv:1403.5607, 2014.
[15] Michael Adam Gelbart.Constrained Bayesian Optimization and Applications. PhD thesis, 2015.
[16] Robert B Gramacy, Genetha A Gray, S´ebastien Le Digabel, Herbert KH Lee, Pritam Ranjan, Garth Wells, and Stefan M Wild. Modeling an augmented lagrangian for blackbox constrained optimization.Technometrics, 58(1):1-11, 2016.
[17] Jos´e Miguel Hern´andez-Lobato, Matthew W Hoffman, and Zoubin Ghahramani. Predictive entropy search for efficient global optimization of black-box functions. InAdvances in neural information processing systems, pages 918-926, 2014.
[18] Jos´e Miguel Hern´andez-Lobato, Michael Gelbart, Matthew Hoffman, Ryan Adams, and Zoubin Ghahramani. Predictive entropy search for bayesian optimization with unknown constraints. In International Conference on Machine Learning, pages 1699-1707, 2015.
[19] Jos´e Miguel Hern´andez-Lobato, Michael A Gelbart, Ryan P Adams, Matthew W Hoffman, and Zoubin Ghahramani.A general framework for constrained bayesian optimization using information-based search. 2016.
[20] Matthew Hoffman, Bobak Shahriari, and Nando Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. InArtificial Intelligence and Statistics, pages 365-374, 2014.
[21] Mingyi Hong and Zhi-Quan Luo. On the linear convergence of the alternating direction method of multipliers.Mathematical Programming, 162(1-2):165-199, 2017.
[22] Mingyi Hong, Zhi-Quan Luo, and Meisam Razaviyayn. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems.SIAM Journal on Optimization, 26(1): 337-364, 2016.
[23] Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, and Jose M Hern´andez-Lobato. Collaborative gaussian processes for preference learning. InAdvances in Neural Information Processing Systems, pages 2096-2104, 2012.
[24] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown.Sequential model-based optimization for general algorithm configuration.InInternational Conference on Learning and Intelligent Optimization, pages 507-523. Springer, 2011.
[25] Donald R Jones. A taxonomy of global optimization methods based on response surfaces.Journal of global optimization, 21(4):345-383, 2001.
[26] Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455-492, 1998.
[27] Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. Fast bayesian optimization of machine learning hyperparameters on large datasets.arXiv preprint arXiv:1605.07079, 2016.
[28] Harold J Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise.Journal of Basic Engineering, 86(1):97-106, 1964.
[29] Yann LeCun. The mnist database of handwritten digits.http://yann. lecun. com/exdb/mnist/, 1998.
[30] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436, 2015.
[31] Daniel James Lizotte.Practical bayesian optimization. University of Alberta, 2008.
[32] Ruben Martinez-Cantin, Nando de Freitas, Arnaud Doucet, and Jos´e A Castellanos. Active policy learning for robot planning and exploration under uncertainty. InRobotics: Science and Systems, volume 3, pages 334-341, 2007.
[33] Thomas Peter Minka.A family of algorithms for approximate Bayesian inference. PhD thesis, Massachusetts Institute of Technology, 2001.
[34] J Moˇckus.On bayesian methods for seeking the extremum.InOptimization Techniques IFIP Technical Conference, pages 400-404. Springer, 1975.
[35] Joao FC Mota, Joao MF Xavier, Pedro MQ Aguiar, and Markus Puschel.D-admm:A communication-efficient distributed algorithm for separable optimization.IEEE Transactions on Signal Processing, 61(10):2718-2723, 2013.
[36] Jorge Nocedal and Stephen J. Wright.Numerical Optimization. Springer, New York, NY, USA, second edition, 2006.
[37] Neal Parikh, Stephen Boyd, et al. Proximal algorithms.Foundations and TrendsRin Optimization, 1(3):127-239, 2014.
[38] Victor Picheny. A stepwise uncertainty reduction approach to constrained global optimization. In Artificial Intelligence and Statistics, pages 787-795, 2014.
[39] Victor Picheny, David Ginsbourger, Yann Richet, and Gregory Caplin. Quantile-based optimization of noisy computer experiments with tunable precision.Technometrics, 55(1):2-13, 2013.
[40] Victor Picheny, Robert B Gramacy, Stefan Wild, and Sebastien Le Digabel. Bayesian optimization under mixed constraints with a slack-variable augmented lagrangian.InAdvances in Neural Information Processing Systems, pages 1435-1443, 2016.
[41] Carl Edward Rasmussen and Christopher KI Williams.Gaussian processes for machine learning, volume 1. MIT press Cambridge, 2006.
[42] Matthias Schonlau, William J Welch, and Donald R Jones. Global versus local search in constrained optimization of computer models.Lecture Notes-Monograph Series, pages 11-25, 1998.
[43] Steven L Scott. A modern bayesian look at the multi-armed bandit.Applied Stochastic Models in Business and Industry, 26(6):639-658, 2010.
[44] Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas. Taking the human out of the loop: A review of bayesian optimization.Proceedings of the IEEE, 104(1): 148-175, 2016.
[45] Jasper Snoek.Bayesian optimization and semiparametric models with applications to assistive technology. PhD thesis, Citeseer, 2013.
[46] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. InAdvances in neural information processing systems, pages 2951-2959, 2012.
[47] Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task bayesian optimization. InAdvances in neural information processing systems, pages 2004-2012, 2013.
[48] Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society. Series B (Methodological), pages 267-288, 1996.
[49] Aimo Torn and Antanas Zilinskas.Global optimization. Springer-Verlag New York, Inc., 1989.
[50] Yu Wang, Wotao Yin, and Jinshan Zeng. Global convergence of admm in nonconvex nonsmooth optimization.Journal of Scientific Computing, pages 1-35, 2015.
[51] Jian Wu, Matthias Poloczek, Andrew G Wilson, and Peter Frazier. Bayesian optimization with gradients. InAdvances in Neural Information Processing Systems, pages 5273-5284, 2017.
[52] Zheng Xu, Soham De, Mario Figueiredo, Christoph Studer, and Tom Goldstein.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.