×

A supermartingale approach to Gaussian process based sequential design of experiments. (English) Zbl 1428.62369

Summary: Gaussian process (GP) models have become a well-established framework for the adaptive design of costly experiments, and notably of computer experiments. GP-based sequential designs have been found practically efficient for various objectives, such as global optimization (estimating the global maximum or maximizer(s) of a function), reliability analysis (estimating a probability of failure) or the estimation of level sets and excursion sets. In this paper, we study the consistency of an important class of sequential designs, known as stepwise uncertainty reduction (SUR) strategies. Our approach relies on the key observation that the sequence of residual uncertainty measures, in SUR strategies, is generally a supermartingale with respect to the filtration generated by the observations. This observation enables us to establish generic consistency results for a broad class of SUR strategies. The consistency of several popular sequential design strategies is then obtained by means of this general result. Notably, we establish the consistency of two SUR strategies proposed by J. Bect et al. [Stat. Comput. 22, No. 3, 773–793 (2012; Zbl 1252.62081)] – to the best of our knowledge, these are the first proofs of consistency for GP-based sequential design algorithms dedicated to the estimation of excursion sets and their measure. We also establish a new, more general proof of consistency for the expected improvement algorithm for global optimization which, unlike previous results in the literature, applies to any GP with continuous sample paths.

MSC:

62L05 Sequential statistical design
68T05 Learning and adaptive systems in artificial intelligence
60G48 Generalizations of martingales
60G15 Gaussian processes

Citations:

Zbl 1252.62081

Software:

EGO
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Adler, R.J. (1981). The Geometry of Random Fields. Wiley Series in Probability and Mathematical Statistics. Chichester: Wiley. · Zbl 0478.60059
[2] Azaïs, J.-M. and Wschebor, M. (2009). Level Sets and Extrema of Random Processes and Fields. Hoboken, NJ: Wiley.
[3] Azzimonti, D., Ginsbourger, D., Chevalier, C., Bect, J. and Richet, Y. (2018). Adaptive design of experiments for conservative estimation of excursion sets. ArXiv preprint, arXiv:1611.07256v3.
[4] Bayarri, M.J., Berger, J.O., Paulo, R., Sacks, J., Cafeo, J.A., Cavendish, J., Lin, C.-H. and Tu, J. (2007). A framework for validation of computer models. Technometrics49 138-154.
[5] Bect, J., Li, L. and Vazquez, E. (2017). Bayesian subset simulation. SIAM/ASA J. Uncertain. Quantificat.5 762-786. · Zbl 06861829
[6] Bect, J., Ginsbourger, D., Li, L., Picheny, V. and Vazquez, E. (2012). Sequential design of computer experiments for the estimation of a probability of failure. Stat. Comput.22 773-793. · Zbl 1252.62081
[7] Billingsley, P. (1999). Convergence of Probability Measures, 2nd ed. Wiley Series in Probability and Statistics: Probability and Statistics. New York: Wiley. A Wiley-Interscience Publication. · Zbl 0944.60003
[8] Bogachev, V.I. (1998). Gaussian Measures. Mathematical Surveys and Monographs62. Providence, RI: Amer. Math. Soc..
[9] Brezis, H. (2011). Functional Analysis, Sobolev Spaces and Partial Differential Equations. Universitext. New York: Springer. · Zbl 1220.46002
[10] Bull, A.D. (2011). Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res.12 2879-2904. · Zbl 1280.90094
[11] Chevalier, C., Ginsbourger, D., Bect, J., Vazquez, E., Picheny, V. and Richet, Y. (2014). Fast parallel kriging-based stepwise uncertainty reduction with application to the identification of an excursion set. Technometrics56 455-465.
[12] Cohn, D.A., Ghahramani, Z. and Jordan, M.I. (1996). Active learning with statistical models. J. Artificial Intelligence Res.4 129-145. · Zbl 0900.68366
[13] Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory. Wiley Series in Telecommunications. New York: Wiley. A Wiley-Interscience Publication.
[14] Cressie, N.A.C. (1993). Statistics for Spatial Data. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York: Wiley. Revised reprint of the 1991 edition, A Wiley-Interscience Publication. · Zbl 1347.62005
[15] DeGroot, M.H. (1962). Uncertainty, information, and sequential experiments. Ann. Math. Stat.33 404-419. · Zbl 0151.22803
[16] DeGroot, M.H. (1984). Changes in utility as information. Theory and Decision17 287-303. · Zbl 0572.90010
[17] DeGroot, M.H. (1986). Concepts of information based on utility. In Recent Developments in the Foundations of Utility and Risk Theory (L. Daboni, A. Montesano and M. Lines, eds.) 265-275. Netherlands, Dordrecht: Springer.
[18] Diaconis, P. (1988). Bayesian numerical analysis. In Statistical Decision Theory and Related Topics, IV, Vol. 1 (West Lafayette, Ind., 1986) 163-175. New York: Springer.
[19] Emmerich, M., Giannakoglou, K. and Naujoks, B. (2006). Single-and multiobjective optimization assisted by Gaussian random field metamodels. IEEE Trans. Evol. Comput.10.
[20] Feliot, P., Bect, J. and Vazquez, E. (2017). A Bayesian approach to constrained single- and multi-objective optimization. J. Global Optim.67 97-133. · Zbl 1390.90441
[21] Forrester, A., Sobester, A. and Keane, A. (2008). Engineering Design Via Surrogate Modelling: A Practical Guide. Wiley.
[22] Frazier, P. (2009). Knowledge-Gradient Methods for Statistical Learning. Ann Arbor, MI: ProQuest LLC. Thesis (Ph.D.)-Princeton University.
[23] Frazier, P.I., Powell, W.B. and Dayanik, S. (2008). A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim.47 2410-2439. · Zbl 1274.62155
[24] Frazier, P., Powell, W. and Dayanik, S. (2009). The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput.21 599-613. · Zbl 1243.91014
[25] Geman, D. and Jedynak, B. (1996). An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell.18 1-14.
[26] Ginsbourger, D., Roustant, O. and Durrande, N. (2016). On degeneracy and invariances of random fields paths with applications in Gaussian process modelling. J. Statist. Plann. Inference170 117-128. · Zbl 1329.60135
[27] Ginsbourger, D., Baccou, J., Chevalier, C., Perales, F., Garland, N. and Monerie, Y. (2014). Bayesian adaptive reconstruction of profile optima and optimizers. SIAM/ASA J. Uncertain. Quantificat.2 490-510. · Zbl 1308.62157
[28] Gramacy, R.B., Gray, G.A., Le Digabel, S., Lee, H.K.H., Ranjan, P., Wells, G. and Wild, S.M. (2016). Modeling an augmented Lagrangian for blackbox constrained optimization. Technometrics58 1-11.
[29] Grünewälder, S., Audibert, J.-Y., Opper, M. and Shawe-Taylor, J. (2010). Regret bounds for Gaussian process bandit problems. In International Conference on Artificial Intelligence and Statistics.
[30] Hainy, M., Müller, W.G. and Wynn, H.P. (2014). Learning functions and approximate Bayesian computation design: ABCD. Entropy16 4353-4374. · Zbl 1341.62091
[31] Hennig, P., Osborne, M.A. and Girolami, M. (2015). Probabilistic numerics and uncertainty in computations. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci.471 20150142, 17. · Zbl 1372.65010
[32] Ibragimov, I.d.A. and Rozanov, Y.A. (1978). Gaussian Random Processes. Applications of Mathematics9. New York-Berlin: Springer. Translated from the Russian by A.B. Aries.
[33] Johnson, R. (1960). An information theory approach to diagnosis. In Proceedings of the 6th Symposium on Reliability and Quality Control 102-109.
[34] Jones, D.R., Schonlau, M. and Welch, W.J. (1998). Efficient global optimization of expensive black-box functions. J. Global Optim.13 455-492. Workshop on Global Optimization (Trier, 1997). · Zbl 0917.90270
[35] Kallenberg, O. (2002). Foundations of Modern Probability, 2nd ed. Probability and Its Applications (New York). New York: Springer. · Zbl 0996.60001
[36] King-Smith, P.E., Grigsby, S.S., Vingrys, A.J., Benes, S.C. and Supowit, A. (1994). Efficient and unbiased modifications of the QUEST threshold method: Theory, simulations, experimental evaluation and practical implementation. Vision Res.34 885-912.
[37] Koehler, J.R., Puhalskii, A.A. and Simon, B. (1998). Estimating functions evaluated by simulation: A Bayesian/analytic approach. Ann. Appl. Probab.8 1184-1215. · Zbl 0928.62055
[38] Ledoux, M. and Talagrand, M. (2011). Probability in Banach Spaces: Isoperimetry and Processes. Classics in Mathematics. Berlin: Springer. Reprint of the 1991 edition. · Zbl 1226.60003
[39] Lukić, M.N. and Beder, J.H. (2001). Stochastic processes with sample paths in reproducing kernel Hilbert spaces. Trans. Amer. Math. Soc.353 3945-3969. · Zbl 0973.60036
[40] MacKay, D.J.C. (1992). Information-based objective functions for active data selection. Neural Comput.4 590-604.
[41] Mockus, J.B., Tiesis, V. and Žilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. In Towards Global Optimization, Volume 2 (L.C.W. Dixon and G.P. Szegö, eds.) 117-129. New York: North Holland.
[42] Molchanov, I. (2005). Theory of Random Sets. Probability and Its Applications (New York). London: Springer London, Ltd.. · Zbl 1109.60001
[43] O’Geran, J.H., Wynn, H.P. and Zhiglyavsky, A.A. (1993). Mastermind as a test-bed for search algorithms. Chance6 31-37.
[44] O’Hagan, A. (1991). Bayes-Hermite quadrature. J. Statist. Plann. Inference29 245-260.
[45] Perlman, M.D. (1974). Jensen’s inequality for a convex vector-valued function on an infinite-dimensional space. J. Multivariate Anal.4 52-65. · Zbl 0274.28012
[46] Picheny, V. (2014). A stepwise uncertainty reduction approach to constrained global optimization. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS).
[47] Picheny, V., Ginsbourger, D., Roustant, O., Haftka, R. and Kim, N.-H. (2010). Adaptive designs of experiments for accurate approximation of target regions. J. Mech. Des.132.
[48] Ranjan, P., Bingham, D. and Michailidis, G. (2008). Sequential experiment design for contour estimation from complex computer codes. Technometrics50 527-541.
[49] Ritter, K. (2000). Average-Case Analysis of Numerical Problems. Lecture Notes in Math.1733. Berlin: Springer. · Zbl 0949.65146
[50] Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (1989). Design and analysis of computer experiments. Statist. Sci.4 409-435. With comments and a rejoinder by the authors. · Zbl 0955.62619
[51] Santner, T.J., Williams, B.J. and Notz, W.I. (2003). The Design and Analysis of Computer Experiments. Springer Series in Statistics. New York: Springer. · Zbl 1041.62068
[52] Schönfeld, P. (1973). A note on the measurability of the pseudo-inverse. J. Econometrics1 313-314. · Zbl 0287.15004
[53] Scott, W., Frazier, P. and Powell, W. (2011). The correlated knowledge gradient for simulation optimization of continuous parameters using Gaussian process regression. SIAM J. Optim.21 996-1026. · Zbl 1229.62018
[54] Shahriari, B., Swersky, K., Wang, Z., Adams, R. and de Freitas, N. (2016). Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE104 148-175.
[55] Srinivas, N., Krause, A., Kakade, S.M. and Seeger, M.W. (2012). Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans. Inform. Theory58 3250-3265. · Zbl 1365.94131
[56] Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer Series in Statistics. New York: Springer.
[57] Stroock, D.W. (2011). Probability Theory: An Analytic View, 2nd ed. Cambridge: Cambridge Univ. Press. · Zbl 1223.60001
[58] Vakhania, N.N., Tarieladze, V.I. and Chobanyan, S.A. (1987). Probability Distributions on Banach Spaces. Mathematics and Its Applications (Soviet Series) 14. Dordrecht: D. Reidel Publishing Co. Translated from the Russian and with a preface by Wojbor A. Woyczynski.
[59] van der Vaart, A.W. and van Zanten, J.H. (2008). Reproducing kernel Hilbert spaces of Gaussian priors. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh. Inst. Math. Stat. (IMS) Collect.3 200-222. Beachwood, OH: IMS.
[60] Vazquez, E. and Bect, J. (2009). A sequential Bayesian algorithm to estimate a probability of failure. In 15th IFAC Symposium on System Identification (SYSID 2009).
[61] Vazquez, E. and Bect, J. (2010). Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J. Statist. Plann. Inference140 3088-3095. · Zbl 1419.62200
[62] Vazquez, E. and Bect, J. (2010). Pointwise consistency of the kriging predictor with known mean and covariance functions. In MODa 9-Advances in Model-Oriented Design and Analysis 221-228. Springer. · Zbl 1419.62200
[63] Vazquez, E. and Bect, J. (2012). Sequential search based on kriging: Convergence analysis of some algorithms. In Bulletin of the ISI 58th World Statistics Congress of the International Statistical Institute, 2011, The Hague, The Netherlands International Statistical Institute.
[64] Villemonteix, J., Vazquez, E. and Walter, E. (2009). An informational approach to the global optimization of expensive-to-evaluate functions. J. Global Optim.44 509-534. · Zbl 1180.90253
[65] Wang, H., Lin, G. and Li, J. (2016). Gaussian process surrogates for failure detection: A Bayesian experimental design approach. J. Comput. Phys.313 247-259. · Zbl 1349.62360
[66] Williams, B.J., Santner, T.J. and Notz, W.I. (2000). Sequential design of computer experiments to minimize integrated response functions. Statist. Sinica10 1133-1152. · Zbl 0961.62069
[67] Yarotsky, D. (2013). Examples of inconsistency in optimization by expected improvement. J. Global Optim.56 1773-1790. · Zbl 1291.90156
[68] Yarotsky, D. (2013). Univariate interpolation by exponential functions and Gaussian RBFs for generic sets of nodes. J. Approx. Theory166 163-175. · Zbl 1264.41006
[69] Zuluaga, M., Krause, A., Sergent, G. and Püschel, M. (2013). Active learning for level set estimation. In International Joint Conference on Artificial Intelligence (IJCAI).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.