Guhaniyogi, Rajarshi; Li, Cheng; Savitsky, Terrance; Srivastava, Sanvesh Distributed Bayesian inference in massive spatial data. (English) Zbl 07708431 Stat. Sci. 38, No. 2, 262-284 (2023). Summary: Gaussian process (GP) regression is computationally expensive in spatial applications involving massive data. Various methods address this limitation, including a small number of Bayesian methods based on distributed computations (or the divide-and-conquer strategy). Focusing on the latter literature, we achieve three main goals. First, we develop an extensible Bayesian framework for distributed spatial GP regression that embeds many popular methods. The proposed framework has three steps that partition the entire data into many subsets, apply a readily available Bayesian spatial process model in parallel on all the subsets, and combine the posterior distributions estimated on all the subsets into a pseudo posterior distribution that conditions on the entire data. The combined pseudo posterior distribution replaces the full data posterior distribution in prediction and inference problems. Demonstrating our framework’s generality, we extend posterior computations for (nondistributed) spatial process models with a stationary full-rank and a nonstationary low-rank GP priors to the distributed setting. Second, we contrast the empirical performance of popular distributed approaches with some widely-used, nondistributed alternatives and highlight their relative advantages and shortcomings. Third, we provide theoretical support for our numerical observations and show that the Bayes \(L_2\)-risks of the combined posterior distributions obtained from a subclass of the divide-and-conquer methods achieves the near-optimal convergence rate in estimating the true spatial surface with various types of covariance functions. Additionally, we provide upper bounds on the number of subsets to achieve these near-optimal rates. Cited in 2 Documents MSC: 62-XX Statistics Keywords:distributed Bayesian inference; Gaussian process; low-rank Gaussian process; massive spatial data; Wasserstein barycenter Software:LatticeKrig; GPvecchia; fields; INLA; FRK; spBayes × Cite Format Result Cite Review PDF Full Text: DOI References: [1] Agueh, M. and Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43 904-924. · Zbl 1223.49045 · doi:10.1137/100805741 [2] Anderes, E., Huser, R., Nychka, D. and Coram, M. (2013). Nonstationary positive definite tapering on the plane. J. Comput. Graph. Statist. 22 848-865. · doi:10.1080/10618600.2012.729982 [3] ANDERSON, C., LEE, D. and DEAN, N. (2014). Identifying clusters in Bayesian disease mapping. Biostatistics 15 457-469. [4] BAI, Y., SONG, P. X.-K. and RAGHUNATHAN, T. E. (2012). Joint composite estimating functions in spatiotemporal models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 799-824. · Zbl 1411.62271 · doi:10.1111/j.1467-9868.2012.01035.x [5] Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2015). Hierarchical Modeling and Analysis for Spatial Data, 2nd ed. Monographs on Statistics and Applied Probability 135. CRC Press, Boca Raton, FL. · Zbl 1358.62009 [6] Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 825-848. · Zbl 1533.62065 · doi:10.1111/j.1467-9868.2008.00663.x [7] BARBIAN, M. H. and ASSUNÇÃO, R. M. (2017). Spatial subsemble estimator for large geostatistical data. Spat. Stat. 22 68-88. · doi:10.1016/j.spasta.2017.08.004 [8] BERLINER, L. M., WIKLE, C. K. and CRESSIE, N. (2000). Long-lead prediction of Pacific SSTs via Bayesian dynamic modeling. J. Climate 13 3953-3968. [9] BEVILACQUA, M., CAAMAÑO-CARRILLO, C. and PORCU, E. (2022). Unifying compactly supported and Matérn covariance functions in spatial statistics. J. Multivariate Anal. 189 Paper No. 104949, 17. · Zbl 1493.62286 · doi:10.1016/j.jmva.2022.104949 [10] BEVILACQUA, M. and GAETAN, C. (2015). Comparing composite likelihood methods based on pairs for spatial Gaussian random fields. Stat. Comput. 25 877-892. · Zbl 1332.62368 · doi:10.1007/s11222-014-9460-6 [11] BOLIN, D. and LINDGREN, F. (2013). A comparison between Markov approximations and other methods for large spatial data sets. Comput. Statist. Data Anal. 61 7-21. · Zbl 1349.62445 · doi:10.1016/j.csda.2012.11.011 [12] BOLIN, D. and WALLIN, J. (2020). Multivariate type G Matérn stochastic partial differential equation random fields. J. R. Stat. Soc. Ser. B. Stat. Methodol. 82 215-239. · Zbl 1440.62180 [13] Chandler, R. E. and Bate, S. (2007). Inference for clustered data using the independence loglikelihood. Biometrika 94 167-183. · Zbl 1142.62367 · doi:10.1093/biomet/asm015 [14] CHENG, G. and SHANG, Z. (2017). Computational limits of divide-and-conquer method. J. Mach. Learn. Res. 18 1-37. · Zbl 1442.90055 [15] Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 209-226. · Zbl 05563351 · doi:10.1111/j.1467-9868.2007.00633.x [16] CRESSIE, N. and WIKLE, C. K. (2011). Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ. · Zbl 1273.62017 [17] CUTURI, M. and DOUCET, A. (2014). Fast computation of Wasserstein barycenters. In Proceedings of the 31st International Conference on Machine Learning, JMLR W&CP 32. [18] DALEY, D. J., PORCU, E. and BEVILACQUA, M. (2015). Classes of compactly supported covariance functions for multivariate random fields. Stoch. Environ. Res. Risk Assess. 29 1249-1263. [19] Datta, A., Banerjee, S., Finley, A. O. and Gelfand, A. E. (2016). Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J. Amer. Statist. Assoc. 111 800-812. · doi:10.1080/01621459.2015.1044091 [20] DI LORENZO, E., SCHNEIDER, N., COBB, K., FRANKS, P., CHHAK, K., MILLER, A., MCWILLIAMS, J., BOGRAD, S., ARANGO, H. et al. (2008). North Pacific gyre oscillation links ocean climate and ecosystem change. Geophys. Res. Lett. 35. [21] EIDSVIK, J., SHABY, B. A., REICH, B. J., WHEELER, M. and NIEMI, J. (2014). Estimation and prediction in spatial models with block composite likelihoods. J. Comput. Graph. Statist. 23 295-315. · doi:10.1080/10618600.2012.760460 [22] Finley, A. O., Sang, H., Banerjee, S. and Gelfand, A. E. (2009). Improving the performance of predictive process modeling for large datasets. Comput. Statist. Data Anal. 53 2873-2884. · Zbl 1453.62090 · doi:10.1016/j.csda.2008.09.008 [23] FINLEY, A. O., DATTA, A., COOK, B. D., MORTON, D. C., ANDERSEN, H. E. and BANERJEE, S. (2019). Efficient algorithms for Bayesian nearest neighbor Gaussian processes. J. Comput. Graph. Statist. 28 401-414. · Zbl 07499062 · doi:10.1080/10618600.2018.1537924 [24] Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502-523. · doi:10.1198/106186006X132178 [25] Gramacy, R. B. and Apley, D. W. (2015). Local Gaussian process approximation for large computer experiments. J. Comput. Graph. Statist. 24 561-578. · doi:10.1080/10618600.2014.914442 [26] GRAMACY, R. B. and HAALAND, B. (2016). Speeding up neighborhood search in local Gaussian process prediction. Technometrics 58 294-303. · doi:10.1080/00401706.2015.1027067 [27] GUHANIYOGI, R. and BANERJEE, S. (2018). Meta-Kriging: Scalable Bayesian modeling and inference for massive spatial datasets. Technometrics 60 430-444. · doi:10.1080/00401706.2018.1437474 [28] GUHANIYOGI, R. and BANERJEE, S. (2019). Multivariate spatial meta Kriging. Statist. Probab. Lett. 144 3-8. · Zbl 1407.62345 · doi:10.1016/j.spl.2018.04.017 [29] GUHANIYOGI, R. and SANSO, B. (2020). Large multi-scale spatial modeling using tree shrinkage priors. Statist. Sinica 30 2023-2050. · Zbl 1464.62288 · doi:10.5705/ss.20 [30] GUHANIYOGI, R., FINLEY, A. O., BANERJEE, S. and GELFAND, A. E. (2011). Adaptive Gaussian predictive process models for large spatial datasets. Environmetrics 22 997-1007. · doi:10.1002/env.1131 [31] GUHANIYOGI, R., LI, C., SAVITSKY, T. and SRIVASTAVA, S. (2023). Supplement to “Distributed Bayesian Inference in Massive Spatial Data.” https://doi.org/10.1214/22-STS868SUPP · Zbl 07708431 [32] Guinness, J. (2018). Permutation and grouping methods for sharpening Gaussian process approximations. Technometrics 60 415-429. · doi:10.1080/00401706.2018.1437476 [33] GUINNESS, J. (2021). Gaussian process learning via Fisher scoring of Vecchia’s approximation. Stat. Comput. 31 Paper No. 25, 8. · Zbl 1475.62033 · doi:10.1007/s11222-021-09999-1 [34] Harville, D. A. (1997). Matrix Algebra from a Statistician’s Perspective. Springer, New York. · Zbl 0881.15001 · doi:10.1007/b98818 [35] HAZRA, A. and HUSER, R. (2021). Estimating high-resolution Red Sea surface temperature hotspots, using a low-rank semiparametric spatial model. Ann. Appl. Stat. 15 572-596. · Zbl 1478.62355 · doi:10.1214/20-aoas1418 [36] HEATON, M. J., CHRISTENSEN, W. F. and TERRES, M. A. (2017). Nonstationary Gaussian process models using spatial hierarchical clustering from finite differences. Technometrics 59 93-101. · doi:10.1080/00401706.2015.1102763 [37] HEATON, M. J., DATTA, A., FINLEY, A., FURRER, R., GUHANIYOGI, R., GERBER, F., GRAMACY, R. B., HAMMERLING, D., KATZFUSS, M. et al. (2019). A case study competition among methods for analyzing large spatial data. J. Agric. Biol. Environ. Stat. 24 398-425. · Zbl 1426.62345 [38] ILLIAN, J. B., SØRBYE, S. H. and RUE, H. (2012). A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA). Ann. Appl. Stat. 6 1499-1530. · Zbl 1257.62093 · doi:10.1214/11-AOAS530 [39] Katzfuss, M. (2017). A multi-resolution approximation for massive spatial datasets. J. Amer. Statist. Assoc. 112 201-214. · doi:10.1080/01621459.2015.1123632 [40] Katzfuss, M. and Guinness, J. (2021). A general framework for Vecchia approximations of Gaussian processes. Statist. Sci. 36 124-141. · Zbl 07368223 · doi:10.1214/19-STS755 [41] KAUFMAN, C. G., SCHERVISH, M. J. and NYCHKA, D. W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. J. Amer. Statist. Assoc. 103 1545-1555. · Zbl 1286.62072 · doi:10.1198/016214508000000959 [42] KNORR-HELD, L. and RASSER, G. (2000). Bayesian detection of clusters and discontinuities in disease maps. Biometrics 56 13-21. · Zbl 1060.62629 [43] LEMOS, R. T. and SANSÓ, B. (2009). A spatio-temporal model for mean, anomaly, and trend fields of North Atlantic sea surface temperature. J. Amer. Statist. Assoc. 104 5-18. · doi:10.1198/jasa.2009.0018 [44] LI, C., SRIVASTAVA, S. and DUNSON, D. B. (2017). Simple, scalable and accurate posterior interval estimation. Biometrika 104 665-680. · Zbl 07072234 · doi:10.1093/biomet/asx033 [45] Lindgren, F., Rue, H. and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 423-498. · Zbl 1274.62360 · doi:10.1111/j.1467-9868.2011.00777.x [46] LINDSTEN, F., JOHANSEN, A. M., NAESSETH, C. A., KIRKPATRICK, B., SCHÖN, T. B., ASTON, J. A. D. and BOUCHARD-CÔTÉ, A. (2017). Divide-and-conquer with sequential Monte Carlo. J. Comput. Graph. Statist. 26 445-458. · doi:10.1080/10618600.2016.1237363 [47] MEHROTRA, S., BRANTLEY, H., WESTMAN, J., BANGERTER, L. and MAITY, A. (2021). Divide-and-Conquer MCMC for Multivariate Binary Data. Preprint. Available at arXiv:2102.09008. [48] MINSKER, S. (2019). Distributed statistical estimation and rates of convergence in normal approximation. Electron. J. Stat. 13 5213-5252. · Zbl 1434.62046 · doi:10.1214/19-EJS1647 [49] MINSKER, S., SRIVASTAVA, S., LIN, L. and DUNSON, D. (2014). Scalable and robust Bayesian inference via the median posterior. In Proceedings of the 31st International Conference on Machine Learning (ICML-14) 1656-1664. [50] MINSKER, S., SRIVASTAVA, S., LIN, L. and DUNSON, D. B. (2017). Robust and scalable Bayes via a median of subset posterior measures. J. Mach. Learn. Res. 18 Paper No. 124, 40. · Zbl 1442.62056 [51] NEISWANGER, W., WANG, C. and XING, E. (2014). Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the 30th International Conference on Uncertainty in Artificial Intelligence 623-632. [52] NYCHKA, D., BANDYOPADHYAY, S., HAMMERLING, D., LINDGREN, F. and SAIN, S. (2015). A multiresolution Gaussian process model for the analysis of large spatial datasets. J. Comput. Graph. Statist. 24 579-599. · doi:10.1080/10618600.2014.914946 [53] NYCHKA, D., HAMMERLING, D., SAIN, S. and LENSSEN, N. (2016). LatticeKrig: Multiresolution kriging based on Markov random fields. R package version 6.4. [54] Quiñonero-Candela, J. and Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6 1939-1959. · Zbl 1222.68282 [55] Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 13 389-427. · Zbl 1283.62071 [56] RIBATET, M., COOLEY, D. and DAVISON, A. C. (2012). Bayesian inference from composite likelihoods, with an application to spatial extremes. Statist. Sinica 22 813-845. · Zbl 1238.62031 [57] Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 319-392. · Zbl 1248.62156 · doi:10.1111/j.1467-9868.2008.00700.x [58] SANG, H. and HUANG, J. Z. (2012). A full scale approximation of covariance functions for large spatial data sets. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 111-132. · Zbl 1411.62274 · doi:10.1111/j.1467-9868.2011.01007.x [59] SANTIN, G. and SCHABACK, R. (2016). Approximation of eigenfunctions in kernel-based spaces. Adv. Comput. Math. 42 973-993. · Zbl 1353.65139 · doi:10.1007/s10444-015-9449-5 [60] SAVITSKY, T. D. and SRIVASTAVA, S. (2018). Scalable Bayes under informative sampling. Scand. J. Stat. 45 534-556. · Zbl 1403.62225 · doi:10.1111/sjos.12312 [61] SCOTT, S. L., BLOCKER, A. W., BONASSI, F. V., CHIPMAN, H. A., GEORGE, E. I. and MCCULLOCH, R. E. (2016). Bayes and big data: The consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11 78-88. [62] SHANG, Z., HAO, B. and CHENG, G. (2019). Nonparametric Bayesian aggregation for massive data. J. Mach. Learn. Res. 20 Paper No. 140, 81. · Zbl 1441.62086 [63] SIMPSON, D., LINDGREN, F. and RUE, H. (2012). In order to make spatial statistics computationally feasible, we need to forget about the covariance function. Environmetrics 23 65-74. · doi:10.1002/env.1137 [64] SRIVASTAVA, S., LI, C. and DUNSON, D. B. (2018). Scalable Bayes via barycenter in Wasserstein space. J. Mach. Learn. Res. 19 Paper No. 8, 35. · Zbl 1444.62037 [65] SRIVASTAVA, S. and XU, Y. (2021). Distributed Bayesian inference in linear mixed-effects models. J. Comput. Graph. Statist. 30 594-611. · Zbl 07499904 · doi:10.1080/10618600.2020.1869025 [66] SRIVASTAVA, S., CEVHER, V., DINH, Q. and DUNSON, D. (2015). WASP: Scalable Bayes via barycenters of subset posteriors. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics 912-920. [67] STEIN, M. L. (2014). Limitations on low rank approximations for covariance matrices of spatial data. Spat. Stat. 8 1-19. · doi:10.1016/j.spasta.2013.06.003 [68] Stein, M. L., Chi, Z. and Welty, L. J. (2004). Approximating likelihoods for large spatial data sets. J. R. Stat. Soc. Ser. B. Stat. Methodol. 66 275-296. · Zbl 1062.62094 · doi:10.1046/j.1369-7412.2003.05512.x [69] SU, Y. (2020). A divide and conquer algorithm of Bayesian density estimation. Preprint. Available at arXiv:2002.07094. [70] SZABÓ, B. and VAN ZANTEN, H. (2019). An asymptotic analysis of distributed nonparametric methods. J. Mach. Learn. Res. 20 Paper No. 87, 30. · Zbl 1434.68457 [71] VAN TREES, H. L. (2001). Detection, Estimation, and Modulation Theory. Wiley, New York. [72] van der Vaart, A. W. and van Zanten, J. H. (2008). Reproducing kernel Hilbert spaces of Gaussian priors. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh. Inst. Math. Stat. (IMS) Collect. 3 200-222. IMS, Beachwood, OH. · Zbl 1159.62004 · doi:10.1214/074921708000000156 [73] van der Vaart, A. and van Zanten, H. (2011). Information rates of nonparametric Gaussian process methods. J. Mach. Learn. Res. 12 2095-2119. · Zbl 1280.68228 [74] Varin, C., Reid, N. and Firth, D. (2011). An overview of composite likelihood methods. Statist. Sinica 21 5-42. · Zbl 1534.62022 [75] Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. J. Roy. Statist. Soc. Ser. B 50 297-312. [76] WANG, X. and DUNSON, D. B. (2013). Parallelizing MCMC via Weierstrass sampler. Preprint. Available at arXiv:1312.4605. [77] WANG, C. and SRIVASTAVA, S. (2021). Divide-and-Conquer Bayesian inference in hidden Markov models. Preprint. Available at arXiv:2105.14395. · Zbl 07690314 [78] WANG, X., GUO, F., HELLER, K. A. and DUNSON, D. B. (2015). Parallelizing MCMC with random partition trees. In Advances in Neural Information Processing Systems (C. Cortes, N. Lawrence, D. Lee, M. Sugiyama and R. Garnett, eds.) 28. [79] WENDLAND, H. (2005). Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics 17. Cambridge Univ. Press, Cambridge. · Zbl 1075.65021 · doi:10.1017/CBO9780511617539 [80] WIKLE, C. K. (2010). Low-rank representations for spatial processes. In Handbook of Spatial Statistics. Chapman & Hall/CRC Handb. Mod. Stat. Methods 107-118. CRC Press, Boca Raton, FL. · doi:10.1201/9781420072884-c8 [81] WIKLE, C. K. and HOLAN, S. H. (2011). Polynomial nonlinear spatio-temporal integro-difference equation models. J. Time Series Anal. 32 339-350. · Zbl 1294.62225 · doi:10.1111/j.1467-9892.2011.00729.x [82] XUE, J. and LIANG, F. (2019). Double-parallel Monte Carlo for Bayesian analysis of big data. Stat. Comput. 29 23-32. · Zbl 1430.62278 · doi:10.1007/s11222-017-9791-1 [83] YANG, Y., BHATTACHARYA, A. and PATI, D. (2017). Frequentist coverage and sup-norm convergence rate in Gaussian process regression. Preprint. Available at arXiv:1708.04753. [84] YANG, Y., PILANCI, M. and WAINWRIGHT, M. J. (2017). Randomized sketches for kernels: Fast and optimal nonparametric regression. Ann. Statist. 45 991-1023. · Zbl 1371.62039 · doi:10.1214/16-AOS1472 [85] ZHANG, T. (2005). Learning bounds for kernel regression using effective data dimensionality. Neural Comput. 17 2077-2098. · Zbl 1080.68044 · doi:10.1162/0899766054323008 [86] Zhang, Y., Duchi, J. and Wainwright, M. (2015). Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. J. Mach. Learn. Res. 16 3299-3340. · Zbl 1351.62142 [87] ZHANG, M. M. and WILLIAMSON, S. A. (2019). Embarrassingly parallel inference for Gaussian processes. J. Mach. Learn. Res. 20 Paper No. 169, 26. · Zbl 1446.62128 [88] ZHU, H., WILLIAMS, C. K. I., ROHWER, R. J. and MORCINIEC, M. (1998). Gaussian regression and optimal finite dimensional linear models. In Neural Networks and Machine Learning (C. M. Bishop, ed.) · Zbl 0921.62086 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.