×

zbMATH — the first resource for mathematics

Multiresolution functional ANOVA for large-scale, many-input computer experiments. (English) Zbl 1445.62189
Summary: The Gaussian process is a standard tool for building emulators for both deterministic and stochastic computer experiments. However, application of Gaussian process models is greatly limited in practice, particularly for large-scale and many-input computer experiments that have become typical. We propose a multiresolution functional ANOVA (MRFA) model as a computationally feasible emulation alternative. More generally, this model can be used for large-scale and many-input nonlinear regression problems. An overlapping group lasso approach is used for estimation, ensuring computational feasibility in a large-scale and many-input setting. New results on consistency and inference for the (potentially overlapping) group lasso in a high-dimensional setting are developed and applied to the proposed MRFA model. Importantly, these results allow us to quantify the uncertainty in our predictions. Numerical examples demonstrate that the proposed model enjoys marked computational advantages. Data capabilities, in terms of both sample size and dimension, meet or exceed best available emulation tools while meeting or exceeding emulation accuracy.
MSC:
62J10 Analysis of variance and covariance (ANOVA)
62R10 Functional data analysis
62R07 Statistical aspects of big data and data science
68T09 Computational aspects of data analysis and big data
Software:
laGP; R; gss; mlegp; GPfit
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Apley, D. W., An Empirical Adjustment of the Uncertainty Quantification in Gaussian Process Modeling (2017)
[2] Aronszajn, N., “Theory of Reproducing Kernels,”, Transactions of the American Mathematical Society, 68, 337-404 (1950) · Zbl 0037.20701
[3] Asmussen, S.; Glynn, P. W., Stochastic Simulation: Algorithms and Analysis (2007), New York: Springer-Verlag, New York · Zbl 1126.65001
[4] Ben-Ari, E. N.; Steinberg, D. M., “Modeling Data From Computer Experiments: An Empirical Comparison of Kriging With Mars and Projection Pursuit Regression,”, Quality Engineering, 19, 327-338 (2007)
[5] Bibbins-Domingo, K.; Chertow, G. M.; Coxson, P. G.; Moran, A.; Lightwood, J. M.; Pletcher, M. J.; Goldman, L., “Projected Effect of Dietary Salt Reductions on Future Cardiovascular Disease,”, New England Journal of Medicine, 362, 590-599 (2010)
[6] Breiman, L., “Bagging Predictors,”, Machine Learning, 24, 123-140 (1996) · Zbl 0858.68080
[7] Breiman, L., Pasting Small Votes for Classification in Large Databases and On-line, Machine Learning, 36, 85-103 (1999)
[8] Büchlmann, P.; Yu, B., “Analyzing Bagging,”, Annals of Statistics, 30, 927-961 (2002) · Zbl 1029.62037
[9] Buja, A.; Stuetzle, W., “Observations on Bagging,”, Statistica Sinica, 16, 323-351 (2006) · Zbl 1096.62034
[10] Craven, P.; Wahba, G., “Smoothing Noisy Data With Spline Functions,”, Numerische Mathematik, 31, 377-403 (1978) · Zbl 0377.65007
[11] Dancik, G. M.; Dorman, K. S., “mlegp: Statistical Analysis for Computer Models of Biological Systems Using R,”, Bioinformatics, 24, 1966-1967 (2008)
[12] Fan, J.; Li, R., “Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties,”, Journal of the American Statistical Association, 96, 1348-1360 (2001) · Zbl 1073.62547
[13] Fang, K.-T.; Li, R.; Sudjianto, A., Design and Modeling for Computer Experiments (2006), New York: Chapman & Hall/CRC, New York · Zbl 1093.62117
[14] Friedman, J. H., “Multivariate Adaptive Regression Splines,”, The Annals of Statistics, 19, 1-67 (1991) · Zbl 0765.62064
[15] Friedman, J. H.; Hall, P., “On Bagging and Nonlinear Estimation,”, Journal of Statistical Planning and Inference, 137, 669-683 (2007) · Zbl 1104.62047
[16] Furrer, R.; Genton, M. G.; Nychka, D., “Covariance Tapering for Interpolation of Large Spatial Datasets,”, Journal of Computational and Graphical Statistics, 15, 502-523 (2006)
[17] Gneiting, T.; Raftery, A. E., “Strictly Proper Scoring Rules, Prediction, and Estimation,”, Journal of the American Statistical Association, 102, 359-378 (2007) · Zbl 1284.62093
[18] Goh, J.; Bingham, D.; Holloway, J. P.; Grosskopf, M. J.; Kuranz, C. C.; Rutter, E., “Prediction and Computer Model Calibration Using Outputs From Multifidelity Simulators,”, Technometrics, 55, 501-512 (2013)
[19] Gramacy, R. B., “laGP: Large-scale Spatial Modeling Via Local Approximate Gaussian Processes in R,”, Journal of Statistical Software, 72, 1-46 (2016)
[20] Gramacy, R. B.; Apley, D. W., “Local Gaussian Process Approximation for Large Computer Experiments,”, Journal of Computational and Graphical Statistics, 24, 561-578 (2015)
[21] Gramacy, R. B.; Haaland, B., “Speeding Up Neighborhood Search in Local Gaussian Process Prediction,”, Technometrics, 58, 294-303 (2016)
[22] Gramacy, R. B.; Niemi, J.; Weiss, R. M., “Massively Parallel Approximate Gaussian Process Regression,”, SIAM/ASA Journal on Uncertainty Quantification, 2, 564-584 (2014) · Zbl 1308.62159
[23] Gu, C., Smoothing Spline ANOVA Models (2013), New York: Springer-Verlag, New York · Zbl 1269.62040
[24] Haaland, B.; Qian, P. Z. G., “Accurate Emulators for Large-scale Computer Experiments,”, The Annals of Statistics, 39, 2974-3002 (2011) · Zbl 1246.65172
[25] Harville, D. A., Matrix Algebra From a Statistician’s Perspective (1997), New York: Springer-Verlag, New York · Zbl 0881.15001
[26] Hastie, T.; Tibshirani, R., Generalized Additive Models (1990), London: Chapman & Hall, London · Zbl 0747.62061
[27] Hötzer, J.; Jainta, M.; Steinmetz, P.; Nestler, B.; Dennstedt, A.; Genau, A.; Bauer, M.; Köstler, H.; Rüde, U., “Large Scale Phase-field Simulations of Directional Ternary Eutectic Solidification,”, Acta Materialia, 93, 194-204 (2015)
[28] Jacob, L.; Obozinski, G.; Vert, J.-P, Group Lasso With Overlap and Graph Lasso, 433-440 (2009)
[29] Juditsky, A.; Nemirovski, A., “Functional Aggregation for Nonparametric Regression,”, The Annals of Statistics, 28, 681-712 (2000) · Zbl 1105.62338
[30] Kaufman, C. G.; Bingham, D.; Habib, S.; Heitmann, K.; Frieman, J. A., “Efficient Emulators of Computer Experiments Using Compactly Supported Correlation Functions, With an Application to Cosmology,”, The Annals of Applied Statistics, 5, 2470-2492 (2011) · Zbl 1234.62166
[31] Kenett, R.; Zacks, S., Modern Industrial Statistics: Design and Control of Quality and Reliability (1998), Pacific Grove, CA: Duxbury Press, Pacific Grove, CA
[32] Lafferty, J.; Wasserman, L., “Rodeo: Sparse Nonparametric Regression in High Dimensions, Advances in Neural Information Processing Systems (NIPS, 18, 707-714 (2006)
[33] Li, K.-C, “Asymptotic Optimality for cp, cl, Cross-validation and Generalized Cross-validation: Discrete Index Set,”, The Annals of Statistics, 15, 958-975 (1987) · Zbl 0653.62037
[34] Lin, Y.; Zhang, H. H., “Component Selection and Smoothing in Multivariate Nonparametric Regression,”, The Annals of Statistics, 34, 2272-2297 (2006) · Zbl 1106.62041
[35] Liu, H.; Zhang, J., Estimation Consistency of the Group Lasso and Its Applications, 376-383 (2009)
[36] Lukić, M.; Beder, J., Stochastic Processes With Sample Paths in Reproducing Kernel Hilbert Spaces, Transactions of the American Mathematical Society, 353, 3945-3969 (2001) · Zbl 0973.60036
[37] MacDonald, B.; Ranjan, P.; Chipman, H., “GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs,”, Journal of Statistical Software, 64, 1-23 (2015)
[38] Meier, L. (2015)
[39] Meier, L.; Van De Geer, S.; Bühlmann, P., “The Group Lasso for Logistic Regression,”, Journal of the Royal Statistical Society, Series B, 70, 53-71 (2008) · Zbl 1400.62276
[40] Meinshausen, N.; Yu, B., “Lasso-type Recovery of Sparse Representations for High-dimensional Data,”, The Annals of Statistics, 37, 246-270 (2009) · Zbl 1155.62050
[41] Ning, Y.; Liu, H., “A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models,”, The Annals of Statistics, 45, 158-195 (2017) · Zbl 1364.62128
[42] Nychka, D.; Bandyopadhyay, S.; Hammerling, D.; Lindgren, F.; Sain, S., “A Multi-resolution Gaussian Process Model for the Analysis of Large Spatial Data Sets,”, Journal of Computational and Graphical Statistics, 24, 579-599 (2015)
[43] Owen, A. B., “Monte Carlo Variance of Scrambled Net Quadrature,”, SIAM Journal on Numerical Analysis, 34, 1884-1910 (1997) · Zbl 0890.65023
[44] Paciorek, C. J.; Lipshitz, B.; Zhuo, W.; Prabhat, Kaufman, C. G.; Thomas, R. C., “Parallelizing Gaussian Process Calculations in R,”, Journal of Statistical Software, 63, 1-23 (2015)
[45] Plumlee, M., “Fast Prediction of Deterministic Functions Using Sparse Grid Experimental Designs,”, Journal of the American Statistical Association, 109, 508, 1581-1591 (2014) · Zbl 1368.65017
[46] Pratola, M.; Higdon, D., “Bayesian Additive Regression Tree Calibration of Complex High-dimensional Computer Models,”, Technometrics, 58, 166-179 (2016)
[47] R Core Team, R: A Language and Environment for Statistical Computing (2015), Vienna, Austria: R Foundation for Statistical Computing, Vienna, Austria
[48] Ranjan, P.; Haynes, R.; Karsten, R., “A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data,”, Technometrics, 53, 366-378 (2011)
[49] Ravikumar, P.; Lafferty, J.; Liu, H.; Wasserman, L., “Sparse Additive Models,”, Journal of the Royal Statistical Society, Series B, 71, 1009-1030 (2009) · Zbl 1411.62107
[50] Revolution Analytics; Weston, S. (2015)
[51] Roth, V.; Fischer, B., The Group-lasso for Generalized Linear Models: Uniqueness of Solutions and Efficient Algorithms, 848-855 (2008)
[52] Sacks, J.; Welch, W. J.; Mitchell, T. J.; Wynn, H. P., “Design and Analysis of Computer Experiments,”, Statistical Science, 4, 409-423 (1989) · Zbl 0955.62619
[53] Santner, T. J.; Williams, B. J.; Notz, W. I., The Design and Analysis of Computer Experiments (2003), New York: Springer-Verlag, New York · Zbl 1041.62068
[54] Shao, J., “An Asymptotic Theory for Linear Model Selection,”, Statistica Sinica, 7, 221-242 (1997)
[55] Shibata, R., “Approximate Efficiency of a Selection Procedure for the Number of Regression Variables,”, Biometrika, 71, 43-49 (1984) · Zbl 0543.62053
[56] Stone, C. J.; Hansen, M. H.; Kooperberg, C.; Truong, Y. K., “Polynomial Splines and Their Tensor Products in Extended Linear Modeling: 1994 Wald Memorial Lecture,”, The Annals of Statistics, 25, 1371-1470 (1997) · Zbl 0924.62036
[57] Sung, C.-L (2019)
[58] Sung, C.-L.; Gramacy, R. B.; Haaland, B., “Exploiting Variance Reduction Potential in Local Gaussian Process Search,”, Statistica Sinica, 28, 577-600 (2018) · Zbl 1390.62157
[59] Wahba, G., Spline Models for Observational Data (1990), Philadelphia, PA: SIAM, Philadelphia, PA · Zbl 0813.62001
[60] Wahba, G.; Wang, Y.; Gu, C.; Klein, R.; Klein, B., “Smoothing Spline ANOVA for Exponential Families, With Application to the Wisconsin Epidemiological Study of Diabetic Retinopathy,”, The Annals of Statistics, 23, 1865-1895 (1995) · Zbl 0854.62042
[61] Wang, H.; Li, R.; Tsai, C.-L, “Tuning Parameter Selectors for the Smoothly Clipped Absolute Deviation Method,”, Biometrika, 94, 553-568 (2007) · Zbl 1135.62058
[62] Wang, K.; Zhang, C.; Su, J.; Wang, B.; Hung, Y., “Optimisation of Composite Manufacturing Processes With Computer Experiments and Kriging Methods,”, International Journal of Computer Integrated Manufacturing, 26, 216-226 (2013)
[63] Wendland, H., “Piecewise Polynomial, Positive Definite and Compactly Supported Radial Functions of Minimal Degree,”, Advances in Computational Mathematics, 4, 389-396 (1995) · Zbl 0838.41014
[64] Wendland, H., Scattered Data Approximation (2005), New York: Cambridge University Press, New York · Zbl 1075.65021
[65] Wu, C. F. J.; Hamada, M. S., Experiments: Planning, Analysis, and Optimization (2009), New York: Wiley, New York · Zbl 1229.62100
[66] Yuan, M.; Lin, Y., “Model Selection and Estimation in Regression With Grouped Variables,”, Journal of the Royal Statistical Society, Series B, 68, 49-67 (2006) · Zbl 1141.62030
[67] Zhang, Y.; Li, R.; Tsai, C.-L, “Regularization Parameter Selections Via Generalized Information Criterion,”, Journal of the American Statistical Association, 105, 312-323 (2010) · Zbl 1397.62262
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.