×

Emulation engines: choice and quantification of uncertainty for complex hydrological models. (English) Zbl 1391.62272

Summary: Complex, mechanistic hydrological models can be computationally expensive, have large numbers of input parameters, and generate multivariate output. Model emulators can be constructed to approximate these complex models with substantial computational savings, making activities such as sensitivity analysis, calibration and uncertainty analysis feasible. Success in the use of an emulator relies on it making accurate and precise predictions of the model output. However, it is often unclear what type of emulation approach will be suitable. We present a comparison of reduced-rank, multivariate emulators built upon different ‘emulation engines’ and apply them to the Australian Water Resource Assessment System model. We examine first-order and second-order approaches which focus on specifying the mean and covariance, respectively. We also introduce a nonparametric approach for quantifying the uncertainty associated with the emulated prediction where this has bounded support. Our results demonstrate that emulation engines based on second-order approaches, such as Gaussian processes, can be computationally burdensome and may be comparable in performance to computationally efficient, first-order methods such as random forests.

MSC:

62P12 Applications of statistics to environmental and related topics
62F15 Bayesian inference
68T05 Learning and adaptive systems in artificial intelligence

Software:

laGP; gamair; R; tgp
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Asher, MJ; Croke, BFW; Jakeman, AJ; Peeters, LJM, A review of surrogate models and their application to groundwater modeling, Water Resources Research, 51, 5957-5973, (2015) · doi:10.1002/2015WR016967
[2] Bastos, L; O’Hagan, A, Diagnostics for Gaussian process emulators, Technometrics, 51, 425-438, (2009) · doi:10.1198/TECH.2009.08019
[3] Breiman, L, Random forests, Machine Learning, 45, 5-32, (2001) · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[4] Conti, S; O’Hagan, A, Bayesian emulation of complex multi-output and dynamic computer models, Journal of Statistical Planning and Inference, 140, 640-651, (2010) · Zbl 1177.62033 · doi:10.1016/j.jspi.2009.08.006
[5] Cressie, N. and Wikle, C. K. (2011). Statistics for spatio-temporal data. John Wiley & Sons. · Zbl 1273.62017
[6] Frolov, S; Baptista, A; Leen, T; Lu, Z; Merwe, R, Fast data assimilation using a nonlinear Kalman filter and a model surrogate: an application to the columbia river estuary, Dynamics of Atmospheres and Oceans, 48, 16-45, (2009) · doi:10.1016/j.dynatmoce.2008.10.004
[7] Gramacy, R; Apley, D, Local Gaussian process approximation for large computer experiments, Journal of Computational and Graphical Statistics, 24, 561-578, (2015) · doi:10.1080/10618600.2014.914442
[8] Gramacy, R; Lee, H, Tgp: an R package for Bayesian nonstationary, semiparametric nonlinear regression ad design by treed Gaussian process models, Journal of Statistical Software, 19, 1-46, (2007) · doi:10.18637/jss.v019.i09
[9] Gramacy, R; Lee, H, Bayesian treed Gaussian process models with an application to computer modeling, Journal of the American Statistical Association, 103, 1119-1130, (2008) · Zbl 1205.62218 · doi:10.1198/016214508000000689
[10] —— (2008b). Gaussian processes and limiting linear models. Computational Statistics and Data Analysis, 53:123-136. · Zbl 1452.62064
[11] Gramacy, RB, Lagp: large-scale spatial modeling via local approximate Gaussian processes in R, Journal of Statistical Software, 72, 1-46, (2016) · doi:10.18637/jss.v072.i01
[12] Hastie, T; Tibshirani, R, Generalized additive models, Statistical Science, 1, 297-310, (1986) · Zbl 0645.62068 · doi:10.1214/ss/1177013604
[13] Higdon, D; Gattiker, J; Williams, B; Rightley, M, Computer model calibration using high-dimensional output, Journal of the American Statistical Association, 103, 570-583, (2008) · Zbl 1469.62414 · doi:10.1198/016214507000000888
[14] Hooten, M; Leeds, W; Fiechter, J; Wikle, C, Assessing first-order emulator inference for physical parameters in nonlinear mechanistic models, Journal of Agricultural, Biological, and Environmental Statistics, 16, 475-494, (2011) · Zbl 1306.62290 · doi:10.1007/s13253-011-0073-7
[15] Kennedy, M; O’Hagan, A, Bayesian calibration of computer models, Journal of the Royal Statistical Society. Series B: Statistical Methodology, 63, 425-450, (2001) · Zbl 1007.62021 · doi:10.1111/1467-9868.00294
[16] Leeds, W., Wikle, C., and Fiechter, J. (2014). Emulator-assisted reduced-rank ecological data assimilation for nonlinear multivariate dynamical spatio-temporal processes. Statistical Methodology, 17(0):126-138. Modern Statistical Methods in Ecology. · Zbl 1486.62297
[17] Leeds, W; Wikle, C; Fiechter, J; Brown, J; Milliff, R, Modeling 3-d spatio-temporal biogeochemical processes with a forest of 1-d statistical emulators, Environmetrics, 24, 1-12, (2013) · Zbl 1525.62164 · doi:10.1002/env.2187
[18] Liu, F; West, M, A dynamic modelling strategy for Bayesian computer model emulation, Bayesian Analysis, 4, 393-412, (2009) · Zbl 1330.65034 · doi:10.1214/09-BA415
[19] Lorenz, E. (1956). Empirical orthogonal functions and statistical weather prediction, statistical forecasting project. Statistical Forecasting Project - Scientific Report No. 1, 49pp.
[20] Luo, J; Lu, W, Comparison of surrogate models with different methods in groundwater remediation process, Journal of Earth System Science, 123, 1579-1589, (2014) · doi:10.1007/s12040-014-0494-0
[21] Machac, D; Reichert, P; Rieckermann, J; Albert, C, Fast mechanism-based emulator of a slow urban hydrodynamic drainage simulator, Environmental Modelling & Software, 78, 54-67, (2016) · doi:10.1016/j.envsoft.2015.12.007
[22] Mara, T; Joseph, O, Comparison of some efficient methods to evaluate the main effect of computer model factors, Journal of Statistical Computation and Simulation, 78, 167-178, (2008) · Zbl 1136.62416 · doi:10.1080/10629360600964454
[23] Oakley, J; O’Hagan, A, Probabilistic sensitivity analysis of complex models: A Bayesian approach, Journal of the Royal Statistical Society. Series B: Statistical Methodology, 66, 751-769, (2004) · Zbl 1046.62027 · doi:10.1111/j.1467-9868.2004.05304.x
[24] O’Hagan, A, Bayesian analysis of computer code outputs: A tutorial, Reliability Engineering and System Safety, 91, 1290-1300, (2006) · doi:10.1016/j.ress.2005.11.025
[25] Paciorek, C; Lipshitz, B; Zhu, W; Prabhat, P; Kaufman, C; Thomas, R, Parallelizing Gaussian process calculations in R, Journal of Statistical Software, 63, 1-23, (2015) · doi:10.18637/jss.v063.i10
[26] Preisendorfer, R. (1988). Principal component analysis in meteorology and oceanography. Elsevier. cited By 919.
[27] R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
[28] Rasmussen, C. E. and Williams, C. K. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA. · Zbl 1177.68165
[29] Razavi, S., Tolson, B. A., and Burn, D. H. (2012). Review of surrogate modeling in water resources. Water Resources Research, 48(7):n/a-n/a. W07401. · Zbl 1046.62027
[30] Reichert, P; White, G; Bayarri, M; Pitman, E, Mechanism-based emulation of dynamic simulation models: concept and application in hydrology, Computational Statistics & Data Analysis, 55, 1638-1655, (2011) · Zbl 1328.62034 · doi:10.1016/j.csda.2010.10.011
[31] Rougier, J, Efficient emulators for multivariate deterministic functions, Journal of Computational and Graphical Statistics, 17, 827-843, (2008) · doi:10.1198/106186008X384032
[32] Sacks, J; William, J; Mitchell, T; Wynn, H, Design and analysis of computer experiments, Statist. Sci., 4, 409-423, (1989) · Zbl 0955.62619 · doi:10.1214/ss/1177012413
[33] Schnorbus, MA; Cannon, AJ, Statistical emulation of streamflow projections from a distributed hydrological model: application to cmip3 and cmip5 climate projections for british columbia, Canada, Water Resources Research, 50, 8907-8926, (2014) · doi:10.1002/2014WR015279
[34] Sobol’, I, On the distribution of points in a cube and the approximate evaluation of integrals, USSR Computational Mathematics and Mathematical Physics, 7, 86-112, (1967) · Zbl 0185.41103 · doi:10.1016/0041-5553(67)90144-9
[35] Sparnocchia, S., Pinardi, N., and Demirov, E. (2003). Multivariate empirical orthogonal function analysis of the upper thermocline structure of the mediterranean sea from observations and model simulations. Annales Geophysicae, 21(1 PART I):167-187. cited By 0.
[36] Stanfill, B; Mielenz, H; Clifford, D; Thorburn, P, Simple approach to emulating complex computer models for global sensitivity analysis, Environmental Modelling & Software, 74, 140-155, (2015) · doi:10.1016/j.envsoft.2015.09.011
[37] Storlie, C; Swiler, L; Helton, J; Sallaberry, C, Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models, Reliability Engineering and System Safety, 94, 1735-1763, (2009) · doi:10.1016/j.ress.2009.05.007
[38] Strong, M; Oakley, J; Brennan, A, Estimating multiparameter partial expected value of perfect information from a probabilistic sensitivity analysis sample: A nonparametric regression approach, Medical Decision Making, 34, 311-326, (2014) · doi:10.1177/0272989X13505910
[39] Sudret, B. (2008). Global sensitivity analysis using polynomial chaos expansions. Reliability Engineering & System Safety, 93(7):964-979. Bayesian Networks in Dependability. · Zbl 1007.62021
[40] Merwe, R; Leen, T; Lu, Z; Frolov, S; Baptista, A, Fast neural network surrogates for very high dimensional physics-based models in computational oceanography, Neural Networks, 20, 462-478, (2007) · doi:10.1016/j.neunet.2007.04.023
[41] Vaze, J., Viney, N., Stenson, M., Renzullo, L., Van Dijk, A., Dutta, D., Crosbie, R., Lerat, J., Penton, D., Vleeshouwer, J., Peeters, L., Teng, J., Kim, S., Hughes, J., Dawes, W., Zhang, Y., Leighton, B., Perraud, J.-M., Joehnk, K., Yang, A., Wang, B., Frost, A., Elmahdi, A., Smith, A., and Daamen, C. (2013). The australian water resource assessment modelling system (awra). In Piantadosi, J., Anderssen, R., and Boland, J., editors, MODSIM2013, 20th International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand. · Zbl 0645.62068
[42] Viney, N., Vaze, J., Crosbie, R., Wang, B., Dawes, W., and Frost, A. (2014). AWRA-L v4.5: technical description of model algorithms and inputs. CSIRO.
[43] Wikle, C, Modern perspectives on statistics for spatio-temporal data, Wiley Interdisciplinary Reviews: Computational Statistics, 7, 86-98, (2015) · doi:10.1002/wics.1341
[44] Wood, S. (2006). Generalized Additive Models: an Introduction with R. CRC press. · Zbl 1087.62082
[45] Zhan, C-S; Song, X-M; Xia, J; Tong, C, An efficient integrated approach for global sensitivity analysis of hydrological model parameters, Environmental Modelling & Software, 41, 39-52, (2013) · doi:10.1016/j.envsoft.2012.10.009
[46] Zhang, Y., Viney, N., Chen, Y., and Li, H. Y. (2011). Collation of streamflow data for 719 unregulated australian catchments. Technical report, CSIRO: Water for a Healthy Country National Research Flagship. · Zbl 1007.68152
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.