Practical Bayesian modeling and inference for massive spatial data sets on modest computing environments.

*(English)*Zbl 07260630Summary: With continued advances in Geographic Information Systems and related computational technologies, statisticians are often required to analyze very large spatial data sets. This has generated substantial interest over the last decade, already too vast to be summarized here, in scalable methodologies for analyzing large spatial data sets. Scalable spatial process models have been found especially attractive due to their richness and flexibility and, particularly so in the Bayesian paradigm, due to their presence in hierarchical model settings. However, the vast majority of research articles present in this domain have been geared toward innovative theory or more complex model development. Very limited attention has been accorded to approaches for easily implementable scalable hierarchical models for the practicing scientist or spatial analyst. This article devises massively scalable Bayesian approaches that can rapidly deliver inference on spatial process that are practically indistinguishable from inference obtained using more expensive alternatives. A key emphasis is on implementation within very standard (modest) computing environments (eg, a standard desktop or laptop) using easily available statistical software packages. Key insights are offered regarding assumptions and approximations concerning practical efficiency.

##### Keywords:

Bayesian inference; Gaussian processes; latent spatial processes; nearest-neighbor Gaussian processes
PDF
BibTeX
XML
Cite

\textit{L. Zhang} et al., Stat. Anal. Data Min. 12, No. 3, 197--209 (2019; Zbl 07260630)

Full Text:
DOI

**OpenURL**

##### References:

[1] | S. Banerjee,On geodetic distance computations in spatial modeling, Biometrics 61(2) (2005), 617-625. MR2140936 |

[2] | S. Banerjee,High-dimensional bayesian geostatistics, Bayesian Anal. 12 (2017), 583-614. MR3654826 · Zbl 1384.62315 |

[3] | S. Banerjee and A. Roy,Linear algebra and matrix analysis for statistics, CRC Press, Boca Raton, FL 2014. MR3222172 · Zbl 1309.15002 |

[4] | S. Banerjee, B. P. Carlin, and A. E. Gelfand,Hierarchical modeling and analysis for spatial data, CRC Press, Boca Raton, FL, 2014. MR3362184 · Zbl 1358.62009 |

[5] | D. Bates and D. Eddelbuettel,Fast and elegant numerical linear algebra using the RcppEigen package, J. Stat. Softw. 52(5) (2013), 1-24. |

[6] | C.Bishop,PatternRecognitionandMachineLearning, Springer-Verlag, New York, 2006. MR2247587 |

[7] | J. ChilĂ©s and P. Delfiner,Geostatistics: Modeling Spatial Uncertainty, John Wiley, New York, 1999. MR1679557 · Zbl 0922.62098 |

[8] | N. Cressie and C. K. Wikle,Statistics for spatio-temporal data, John Wiley & Sons, Hoboken, NJ, 2015. MR2848400 · Zbl 1273.62017 |

[9] | A. Datta et al.,Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets, J. Am. Stat. Assoc. 111 (2016a), 800-812. https://doi.org/10.1080/01621459.2015. 1044091. MR3538706 |

[10] | A. Datta et al.,Non-separable dynamic nearest-neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis, Ann. Appl. Stat. 10 (2016b), 1286-1316. https://doi.org/10.1214/16-AOAS931. MR3553225 · Zbl 1391.62269 |

[11] | A. Finley, A. Datta, and S. Banerjee,spNNGP: Spatial regression models for large datasets using nearest neighbor Gaussian processes. R Package Version 0.1.1. 2017a. available at https://CRAN. R-project.org/package=spNNGP |

[12] | A. O. Finley, A. Datta, B. C. Cook, D. C. Morton, H. E. Andersen, and S. Banerjee,Applying nearest neighbor gaussian processes to massive spatial data sets: Forest canopy height prediction across tanana valley alaska. arXiv preprint arXiv:1702.00434, 2017b. |

[13] | R. Furrer, M. G. Genton, and D. Nychka,Covariance tapering for interpolation of large spatial datasets, J. Comput. Graph. Stat. 15 (2006), 503-523. MR2970921 |

[14] | A. E. Gelfand et al.,Handbook of spatial statistics, CRC Press, Boca Raton, FL, 2010. MR2761512 · Zbl 1188.62284 |

[15] | A. Gelman et al.,Bayesian data analysis, Chapman & Hall/CRC Texts in Statistical Science, 3rd ed., Chapman & Hall/CRC, Boca Raton, FL, 2013. MR3235677 |

[16] | T. Gneiting and A. E. Raftery,Strictly proper scoring rules, prediction, and estimation, J. Am. Stat. Assoc. 102(477) (2007), 359-378. MR2345548 · Zbl 1284.62093 |

[17] | G. H. Golub and C. F. Van Loan,Matrix computations, 4th ed., Johns Hopkins University Press, Baltimore, MD, 2012. MR3024913 · Zbl 0865.65009 |

[18] | J. Guinness,Permutation methods for sharpening Gaussian process approximations. arXiv preprint arXiv:1609.05372, 2016. |

[19] | M. Heaton, A. Datta, A. Finley, R. Furrer, R. Guhaniyogi, F. Gerber, D. Hammerling, M. Katzfuss, F. Lindgren, D. Nychka, and A. Zammit-Mangion,Methods for analyzing large spatial data: A review and comparison. arXiv:1710.05013, 2017, available at https://arxiv.org/abs/1710.05013 · Zbl 1426.62345 |

[20] | M. Katzfuss,A multi-resolution approximation for massive spatial datasets, J. Am. Stat. Assoc. 112 (2017), 201-214. https://doi.org/ 10.1080/01621459.2015.1123632. MR3646566 |

[21] | M. Katzfuss,Bayesian nonstationary modeling for very large spatial datasets, Environmetrics 24 (2013), 189-200. MR3067342 |

[22] | M. Katzfuss and J. Guinness,A general framework for vecchia approximations of gaussian processes, arXiv preprint arXiv:1708.06302, 2017. |

[23] | S. L. Lauritzen,Graphical models, Clarendon Press, Oxford, 1996. MR1419991 · Zbl 0907.62001 |

[24] | F. Lindgren, H. Rue, and J. Lindstrom,An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach, J. R. Stat. Soc. Series B Stat. Methodology 73(4) (2011), 423-498. MR2853727 https:// doi.org/10.1111/j.1467-9868.2011.00777.x. · Zbl 1274.62360 |

[25] | K. Murphy,Machine Learning: A probabilistic perspective, The MIT Press, Cambridge, MA, 2012. · Zbl 1295.68003 |

[26] | D. Nychka, C. Wikle, and J. A. Royle,Multiresolution models for nonstationary spatial covariance functions, Stat. Model. 2(4) (2002), 315-331. MR1951588 · Zbl 1195.62146 |

[27] | D. Nychka et al.,A multiresolution Gaussian process model for the analysis of large spatial datasets, J. Comput. Graph. Stat. 24(2) (2015), 579-599. https://doi.org/10.1080/10618600.2014.914946. MR3357396 |

[28] | P.J. Ribeiro Jr and P.J. Diggle,geoR: A package for geostatistical analysis. R Package Version 1.7-4. available at https://cran.rproject.org/web/packages/geoR |

[29] | H. Rue and L. Held,Gaussian Markov random fields: Theory and applications, Monographs on statistics and applied probability, Chapman & Hall/CRC, Boca Raton, FL, 2005 http://opac.inria.fr/ record=b1119989. MR2130347 · Zbl 1093.60003 |

[30] | H. Rue, S. Martino, and N. Chopin,Approximate bayesian inference for latent Gaussian models by using integrated nested laplace approximations, J. R. Stat. Soc. Series B Stat. Methodology 71(2) (2009), 319-392. https://doi.org/10.1111/j.1467-9868.2008. 00700.x. MR2649602 · Zbl 1248.62156 |

[31] | H. Sang and J. Z. Huang,A full scale approximation of covariance functions for large spatial data sets, J. R. Stat. Soc. Ser. B 74 (2012), 111-132. MR2885842 · Zbl 1411.62274 |

[32] | Stan Development Team,RStan: The R interface to Stan,R Package Version 2.14.1, 2016, available at http://mc-stan.org/ |

[33] | M. L. Stein,Interpolation of spatial data: Some theory for Kriging, 1st ed., Springer Science & Business Media, New York, 2012. MR1697409 |

[34] | M. L. Stein, Z. Chi, and L. J. Welty, Eds.,Approximating likelihoods for large spatial data sets, J. R. Stat. Soc. Ser. B 66 (2004), 275-296. MR2062376 · Zbl 1062.62094 |

[35] | J. R. Stroud, M. L. Stein, and S. Lysen,Bayesian and maximum likelihood estimation for Gaussian processes on an incomplete lattice, J. Comput. Graph. Stat. 26 (2017), 108-120. https://doi.org/ 10.1080/10618600.2016.1152970. MR3610412 |

[36] | A. V. Vecchia,Estimation and model identification for continuous spatial processes, J. R. Stat. Soc. Ser. B 50 (1988), 297-312. MR0964183 |

[37] | O. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.