×

zbMATH — the first resource for mathematics

Multi-scale process modelling and distributed computation for spatial data. (English) Zbl 1452.62364
Summary: Recent years have seen a huge development in spatial modelling and prediction methodology, driven by the increased availability of remote-sensing data and the reduced cost of distributed-processing technology. It is well known that modelling and prediction using infinite-dimensional process models is not possible with large data sets, and that both approximate models and, often, approximate-inference methods, are needed. The problem of fitting simple global spatial models to large data sets has been solved through the likes of multi-resolution approximations and nearest-neighbour techniques. Here we tackle the next challenge, that of fitting complex, nonstationary, multi-scale models to large data sets. We propose doing this through the use of superpositions of spatial processes with increasing spatial scale and increasing degrees of nonstationarity. Computation is facilitated through the use of Gaussian Markov random fields and parallel Markov chain Monte Carlo based on graph colouring. The resulting model allows for both distributed computing and distributed data. Importantly, it provides opportunities for genuine model and data scalability and yet is still able to borrow strength across large spatial scales. We illustrate a two-scale version on a data set of sea-surface temperature containing on the order of one million observations, and compare our approach to state-of-the-art spatial modelling and prediction methods.
MSC:
62H11 Directional data; spatial statistics
05C15 Coloring of graphs and hypergraphs
62-08 Computational methods for problems pertaining to statistics
62M20 Inference from stochastic processes and prediction
62P12 Applications of statistics to environmental and related topics
65C05 Monte Carlo methods
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aune, E.; Simpson, DP; Eidsvik, J., Parameter estimation in high dimensional Gaussian distributions, Stat. Comput., 24, 247-263 (2014) · Zbl 1325.62006
[2] Banerjee, S.; Gelfand, AE; Finley, AO; Sang, H., Gaussian predictive process models for large spatial data sets, J. R. Stat. Soc. B, 70, 825-848 (2008) · Zbl 05563371
[3] Bender, EA; Wilf, HS, A theoretical analysis of backtracking in the graph coloring problem, J. Algorithms, 6, 275-282 (1985) · Zbl 0601.68039
[4] Berliner, LM; Hanson, KM; Silver, RN, Hierarchical Bayesian time series models, Maximum Entropy and Bayesian Methods, 15-22 (1996), New York: Springer, New York
[5] Besag, J.; Green, P.; Higdon, D.; Mengersen, K., Bayesian computation and stochastic systems, Stat. Sci., 10, 3-41 (1995) · Zbl 0955.62552
[6] Brown, DA; McMahan, CS; Self, SW, Sampling strategies for fast updating of Gaussian Markov random fields, Am. Stat. (2019)
[7] Cao, C.; Xiong, J.; Blonski, S.; Liu, Q.; Uprety, S.; Shao, X.; Bai, Y.; Weng, F., Suomi NPP VIIRS sensor data record verification, validation, and long-term performance monitoring, J. Geophys. Res. Atmos., 118, 11664-11678 (2013)
[8] Cressie, N.; Johannesson, G., Fixed rank kriging for very large spatial data sets, J. R. Stat. Soc. B, 70, 209-226 (2008) · Zbl 05563351
[9] Cressie, N.; Wikle, CK, Statistics for Spatio-Temporal Data (2011), Hoboken: Wiley, Hoboken
[10] Dewar, M.; Scerri, K.; Kadirkamanathan, V., Data-driven spatio-temporal modeling using the integro-difference equation, IEEE Trans. Signal Process., 57, 83-91 (2009) · Zbl 1391.45010
[11] Eberly, LE; Carlin, BP, Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models, Stat. Med., 19, 2279-2294 (2000)
[12] Finley, AO; Datta, A.; Cook, BC; Morton, DC; Andersen, HE; Banerjee, S., Efficient algorithms for Bayesian nearest-neighbor Gaussian processes, J. Comput. Graph. Stat., 28, 401-414 (2019)
[13] Gelfand, AE; Carlin, BP; Trevisani, M., On computation using Gibbs sampling for multilevel models, Stat. Sin., 11, 981-1003 (2001) · Zbl 0988.62016
[14] Gelman, A.; Carlin, JB; Stern, HS; Dunson, DB; Vehtari, A.; Rubin, DB, Bayesian Data Analysis (2013), Boca Raton: Chapman & Hall/CRC Press, Boca Raton
[15] Gneiting, T.; Raftery, AE, Strictly proper scoring rules, prediction, and estimation, J. Am. Stat. Assoc., 102, 359-378 (2007) · Zbl 1284.62093
[16] Gonthier, G., Formal proof—the four-color theorem, Notices of the AMS, 55, 1382-1393 (2008) · Zbl 1195.05026
[17] Gonzalez, J., Low, Y., Gretton, A., Guestrin, C.: Parallel Gibbs sampling: from colored fields to thin junction trees. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 324-332 (2011)
[18] Jensen, CS; Kjærulff, U.; Kong, A., Blocking Gibbs sampling in very large probabilistic expert systems, Int. J. Hum. Comput. Stud., 42, 647-666 (1995)
[19] Katzfuss, M., A multi-resolution approximation for massive spatial datasets, J. Am. Stat. Assoc., 112, 201-214 (2017)
[20] Katzfuss, M.; Hammerling, D., Parallel inference for massive distributed spatial data using low-rank models, Stat. Comput., 27, 363-375 (2017) · Zbl 06697662
[21] Knorr-Held, L.; Rue, H., On block updating in Markov random field models for disease mapping, Scand. J. Stat., 29, 597-614 (2002) · Zbl 1039.62092
[22] Lauritzen, SL, Graphical Models (1996), Oxford: Clarendon Press, Oxford
[23] Lindgren, F.; Rue, H., Bayesian spatial modelling with R-INLA, J. Stat. Softw., 63, 19, 1-25 (2015)
[24] Lindgren, F.; Rue, H.; Lindström, J., An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach, J. R. Stat. Soc. B, 73, 423-498 (2011) · Zbl 1274.62360
[25] Monterrubio-Gómez, K.; Roininen, L.; Wade, S.; Damoulas, T.; Girolami, M., Posterior inference for sparse hierarchical non-stationary models, Comput. Stat. Data Anal., 148, 106954 (2020) · Zbl 07212308
[26] Nordhausen, K.; Oja, H.; Filzmoser, P.; Reimann, C., Blind source separation for spatial compositional data, Math. Geosci., 47, 753-770 (2015) · Zbl 1323.86031
[27] Nychka, D.; Bandyopadhyay, S.; Hammerling, D.; Lindgren, F.; Sain, S., A multiresolution Gaussian process model for the analysis of large spatial datasets, J. Comput. Graph. Stat., 24, 579-599 (2015)
[28] R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2019)
[29] Rue, H.; Held, L., Gaussian Markov Random Fields: Theory and Applications (2005), Boca Raton: Chapman and Hall/CRC Press, Boca Raton · Zbl 1093.60003
[30] Rue, H.; Tjelmeland, H., Fitting Gaussian Markov random fields to Gaussian fields, Scand. J. Stat., 29, 31-49 (2002) · Zbl 1017.62088
[31] Sahr, K., Location coding on icosahedral aperture 3 hexagon discrete global grids, Comput. Environ. Urban Syst., 32, 174-187 (2008)
[32] Sahu, SK; Mardia, KV, A Bayesian kriged Kalman model for short-term forecasting of air pollution levels, J. R. Stat. Soc. Ser. C, 54, 223-244 (2005) · Zbl 05188682
[33] Sang, H.; Huang, JZ, A full scale approximation of covariance functions for large spatial data sets, J. R. Stat. Soc. B, 74, 111-132 (2012) · Zbl 1411.62274
[34] Scherer, POJ, Computational Physics: Simulation of Classical and Quantum Systems (2017), Cham: Springer, Cham
[35] Simpson, D.; Illian, JB; Lindgren, F.; Sørbye, SH; Rue, H., Going off grid: computationally efficient inference for log-Gaussian Cox processes, Biometrika, 103, 49-70 (2016) · Zbl 1452.62704
[36] Van Dyk, DA; Park, T., Partially collapsed Gibbs samplers: theory and methods, J. Am. Stat. Assoc., 103, 790-796 (2008) · Zbl 05564532
[37] Wikle, CK; Zammit-Mangion, A.; Cressie, N., Spatio-Temporal Statistics with R (2019), Boca Raton: Chapman & Hall/CRC, Boca Raton
[38] Wilkinson, DJ; Kontoghiorghes, EJ, Parallel Bayesian computation, Handbook of Parallel Computation and Statistics, 477-508 (2006), Boca Raton: CRC Press, Boca Raton
[39] Zammit-Mangion A, Cressie N (2020) FRK: an R package for spatial and spatio-temporal prediction with large datasets. J. Stat. Softw. https://arxiv.org/pdf/1705.08105.pdf
[40] Zammit-Mangion, A.; Sanguinetti, G.; Kadirkamanathan, V., Variational estimation in spatiotemporal systems from continuous and point-process observations, IEEE Trans. Signal Process., 60, 3449-3459 (2012) · Zbl 1391.62186
[41] Zammit-Mangion, A.; Cressie, N.; Shumack, C., On statistical approaches to generate Level 3 products from satellite remote sensing retrievals, Remote Sens., 10, 155 (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.