×

A semiautomatic method for history matching using sequential Monte Carlo. (English) Zbl 1468.62261

Summary: The aim of the history matching method is to locate nonimplausible regions of the parameter space of complex deterministic or stochastic models by matching model outputs with data. It does this via a series of waves where at each wave an emulator is fitted to a small number of training samples. An implausibility measure is defined which takes into account the closeness of simulated and observed outputs as well as emulator uncertainty. As the waves progress, the emulator becomes more accurate so that training samples are more concentrated on promising regions of the space and poorer parts of the space are rejected with more confidence. While history matching has proved to be useful, existing implementations are not fully automated, and some ad hoc choices are made during the process, which involves user intervention and is time consuming. This occurs especially when the nonimplausible region becomes small and it is difficult to sample this space uniformly to generate new training points. In this article we develop a sequential Monte Carlo (SMC) algorithm for implementing history matching that is semiautomated. Our novel SMC approach reveals that the history matching method yields a nonimplausible region that can be multimodal, highly irregular, and very difficult to sample uniformly. Our SMC approach offers a much more reliable sampling of the nonimplausible space, which requires additional computation compared to other approaches used in the literature.

MSC:

62F35 Robustness and adaptive procedures (parametric inference)
62F15 Bayesian inference
62K20 Response surface designs
62L12 Sequential estimation
62M05 Markov processes: estimation; hidden Markov models
62-08 Computational methods for problems pertaining to statistics

Software:

DREAM
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] I. Andrianakis, N. McCreesh, I. Vernon, T. J. McKinley, J. E. Oakley, R. N. Nsubuga, M. Goldstein, and R. G. White (2017), Efficient history matching of a high dimensional individual-based HIV transmission model, SIAM/ASA J. Uncertain. Quantif., 5, pp. 694-719, https://doi.org/10.1137/16M1093008. · Zbl 1387.62111
[2] I. Andrianakis, I. R. Vernon, N. McCreesh, T. J. McKinley, J. E. Oakley, R. N. Nsubuga, M. Goldstein, and R. G. White (2015), Bayesian history matching of complex infectious disease models using emulation: A tutorial and a case study on HIV in Uganda, PLOS Comput. Biol., 11, e1003968. · Zbl 1387.62111
[3] C. Andrieu and G. O. Roberts (2009), The pseudo-marginal approach for efficient Monte Carlo computations, Ann. Statist., 37, pp. 697-725. · Zbl 1185.60083
[4] C. C.-M. Chen, C. C. Drovandi, J. M. Keith, K. Anthony, M. J. Caley, and K. L. Mengersen (2017), Bayesian semi-individual based model with approximate Bayesian computation for parameters calibration: Modelling Crown-of-Thorns populations on the Great Barrier Reef, Ecologic. Model., 364, pp. 113-123.
[5] N. Chopin (2002), A sequential particle filter method for static models, Biometrika, 89, pp. 539-552. · Zbl 1036.62062
[6] P. S. Craig, M. Goldstein, A. H. Seheult, and J. A. Smith (1997), Pressure matching for hydrocarbon reservoirs: A case study in the use of Bayes linear strategies for large computer experiments, in Case Studies in Bayesian Statistics, Springer, New York, pp. 37-93. · Zbl 0895.62105
[7] P. Del Moral, A. Doucet, and A. Jasra (2006), Sequential Monte Carlo samplers, J. R. Stat. Soc. Ser. B Stat. Methodol., 68, pp. 411-436. · Zbl 1105.62034
[8] P. Del Moral, A. Doucet, and A. Jasra (2012), An adaptive sequential Monte Carlo method for approximate Bayesian computation, Stat. Comput., 22, pp. 1009-1020. · Zbl 1252.65025
[9] C. C. Drovandi, N. Cusimano, S. Psaltis, B. A. Lawson, A. N. Pettitt, P. Burrage, and K. Burrage (2016), Sampling methods for exploring between-subject variability in cardiac electrophysiology experiments, J. Roy. Soc. Interface, 13, 20160214.
[10] C. C. Drovandi, M. T. Moores, and R. J. Boys (2018), Accelerating pseudo-marginal MCMC using Gaussian processes, Comput. Statist. Data Anal., 118, pp. 1-17. · Zbl 1469.62057
[11] C. C. Drovandi and A. N. Pettitt (2011a), Estimation of parameters for macroparasite population evolution using approximate Bayesian computation, Biometrics, 67, pp. 225-233. · Zbl 1217.62128
[12] C. C. Drovandi and A. N. Pettitt (2011b), Using approximate Bayesian computation to estimate transmission rates of nosocomial pathogens, Stat. Commun. Infec. Dis., 3, 2.
[13] P. Fearnhead, V. Giagos, and C. Sherlock (2014), Inference for reaction networks using the linear noise approximation, Biometrics, 70, pp. 457-466. · Zbl 1419.62346
[14] P. Fearnhead and B. M. Taylor (2013), An adaptive sequential Monte Carlo sampler, Bayesian Anal., 8, pp. 411-438. · Zbl 1329.62055
[15] F. Fenicia, H. H. Savenije, P. Matgen, and L. Pfister (2007), A comparison of alternative multiobjective calibration strategies for hydrological modeling, Water Resources Res., 43, W03434.
[16] A. Golightly and D. J. Wilkinson (2005), Bayesian inference for stochastic kinetic models using a diffusion approximation, Biometrics, 61, pp. 781-788. · Zbl 1079.62110
[17] N. J. Gordon, D. J. Salmond, and A. F. M. Smith (1993), Novel approach to nonlinear/non-Gaussian Bayesian state estimation, IEE Proc. F Radar Signal Process., 140, pp. 107-113.
[18] P. Holden, N. Edwards, J. Hensman, and R. Wilkinson (2018), ABC for climate: Dealing with expensive simulators, in Handbook of Approximate Bayesian Computation, Chapman and Hall/CRC, Boca Raton, FL, Chapter 19.
[19] R. Holenstein (2009), Particle Markov Chain Monte Carlo, Ph.D. thesis, University of British Columbia, Vancouver, BC, Canada. · Zbl 1184.65001
[20] R. L. Iman (2008), Latin hypercube sampling, in Encyclopedia of Quantitative Risk Analysis and Assessment, Wiley Online Library, John Wiley & Sons, New York.
[21] T. McKinley, I. Vernon, I. Andrianakis, N. McCreesh, J. Oakley, R. Nsubuga, M. Goldstein, and R. White (2018), Approximate Bayesian computation and simulation-based inference for complex stochastic epidemic models, Stat. Sci., 33, pp. 4-18. · Zbl 1407.62406
[22] J. Mockus (2012), Bayesian Approach to Global Optimization: Theory and Applications, Math. Appl. 37, Springer, Dordrecht, The Netherlands. · Zbl 0693.49001
[23] M. Molga and C. Smutnicki (2005), Test Functions for Optimization Needs, http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf.
[24] J. Monteith (1965), Evaporation and environment, Symposia of the Society for Experimental Biology, 19, pp. 205-224.
[25] H. Niederreiter (1992), Random Number Generation and Quasi-Monte Carlo Methods, CBMS-NSF Reg. Conf. Ser. Appl. Math. 63, SIAM, Philadelphia, https://doi.org/10.1137/1.9781611970081. · Zbl 0761.65002
[26] D. S. Oliver and Y. Chen (2011), Recent progress on reservoir history matching: A review, Comput. Geosci., 15, pp. 185-221. · Zbl 1209.86001
[27] U. Picchini (2014), Inference for SDE models via approximate Bayesian computation, J. Comput. Graph. Statist., 23, pp. 1080-1100.
[28] F. Pukelsheim (1994), The three sigma rule, Amer. Statist., 48, pp. 88-91.
[29] C. E. Rasmussen and C. K. Williams (2006), Gaussian Processes for Machine Learning, MIT Press, Cambridge, MA. · Zbl 1177.68165
[30] J. M. Salter and D. Williamson (2016), A comparison of statistical emulation methodologies for multi-wave calibration of environmental models, Environmetrics, 27, pp. 507-523.
[31] G. Schoups, J. Vrugt, F. Fenicia, and N. van de Giesen (2010), Corruption of accuracy and efficiency of Markov chain Monte Carlo simulation by inaccurate numerical implementation of conceptual hydrologic models, Water Resources Res., 46, W10530.
[32] G. Schoups and J. A. Vrugt (2010), A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors, Water Resources Res., 46, W10531.
[33] I. Schuster, S. Strathmann, B. Paige, and D. Sejdinovic (2017), Kernel sequential Monte Carlo, in Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2017, M. Ceci, J. Hollmen, L. Todorovski, C. Vens, and S. Dzeroski, eds., Lecture Notes in Comput. Sci. 10534, Springer, Berlin, pp. 390-409.
[34] L. F. South, A. N. Pettitt, and C. C. Drovandi (2018), Sequential Monte Carlo for static Bayesian models with independent MCMC proposals, Bayesian Anal., 14, pp. 753-776. · Zbl 1421.62059
[35] Z. Tavassoli, J. N. Carter, and P. R. King (2005), An analysis of history matching errors, Comput. Geosci., 9, pp. 99-123. · Zbl 1130.86303
[36] I. Vernon, M. Goldstein, and R. Bower (2014), Galaxy formation: Bayesian history matching for the observable universe, Statist. Sci., 29, pp. 81-90. · Zbl 1332.85007
[37] I. Vernon, M. Goldstein, and R. G. Bower (2010a), Galaxy formation: A Bayesian uncertainty analysis, Bayesian Anal., 5, pp. 619-669. · Zbl 1330.85005
[38] I. Vernon, M. Goldstein, and R. G. Bower (2010b), Rejoinder for galaxy formation: A Bayesian uncertainty analysis, Bayesian Anal., 5, pp. 619-669. · Zbl 1330.85005
[39] I. Vernon, J. Liu, M. Goldstein, J. Rowe, J. Topping, and K. Lindsey (2018), Bayesian uncertainty analysis for complex systems biology models: Emulation, global parameter searches and evaluation of gene functions, BMC Systems Biol., 12, 1.
[40] B. N. Vo, C. C. Drovandi, A. N. Pettitt, and G. J. Pettet (2015), Melanoma cell colony expansion parameters revealed by approximate Bayesian computation, PLOS Comput. Biol., 11, e1004635.
[41] J. A. Vrugt (2016), Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation, Environment. Model. Software, 75, pp. 273-316.
[42] R. Wilkinson (2014), Accelerating ABC methods using Gaussian processes, J. Mach. Learn. Res., 33, pp. 1015-1023.
[43] D. Williamson and I. Vernon (2013), Efficient Uniform Designs for Multi-wave Computer Experiments, preprint, https://arxiv.org/abs/1309.3520.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.