zbMATH — the first resource for mathematics

A memory-free spatial additive mixed modeling for big spatial data. (English) Zbl 1447.62058
Summary: This study develops a spatial additive mixed modeling (AMM) approach estimating spatial and non-spatial effects from large samples, such as millions of observations. Although fast AMM approaches are already well established, they are restrictive in that they assume a known spatial dependence structure. To overcome this limitation, this study develops a fast AMM with the estimation of spatial structure in residuals and regression coefficients together with non-spatial effects. We rely on a Moran coefficient-based approach to estimate the spatial structure. The proposed approach pre-compresses large matrices whose size grows with respect to the sample size \(N\) before the model estimation; thus, the computational complexity for the estimation is independent of the sample size. Furthermore, the pre-compression is done through a block-wise procedure that makes the memory consumption independent of \(N\). Eventually, the spatial AMM is memory free and fast even for millions of observations. The developed approach is compared to alternatives through Monte Carlo simulation experiments. The result confirms the estimation accuracy of the spatially varying coefficients and group coefficients, and computational efficiency of the developed approach. Finally, we apply our approach to an income analysis using United States (US) data in 2015.
62H11 Directional data; spatial statistics
62R07 Statistical aspects of big data and data science
62J05 Linear regression; mixed models
62P20 Applications of statistics to economics
Arc_Mat; FRK; gamair; glasso; lme4
Full Text: DOI
[1] Anselin, L., Spatial externalities, spatial multipliers, and spatial econometrics, International Regional Science Review, 26, 2, 153-166 (2003)
[2] Anselin, L., Thirty years of spatial econometrics, Papers in Regional Science, 89, 1, 3-25 (2010)
[3] Arbia, G.; Ghiringhelli, C.; Mira, A., Estimation of spatial econometric linear models with large datasets: How big can spatial Big Data be?, Regional Science and Urban Economics, 76, 67-73 (2019)
[4] Banerjee, S.; Gelfand, AE; Finley, AO; Sang, H., Gaussian predictive process models for large spatial data sets, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 4, 825-848 (2008) · Zbl 05563371
[5] Bates, D. M. (2010). lme4: Mixed-effects modeling with R. http://lme4.r-forge.r-project.org/book/. Accessed 25 Nov 2011.
[6] Cressie, N., Statistics for spatial data (1992), New York: Wiley, New York
[7] Cressie, N.; Johannesson, G., Fixed rank kriging for very large spatial data sets, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 1, 209-226 (2008) · Zbl 05563351
[8] Datta, A.; Banerjee, S.; Finley, AO; Gelfand, AE, Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets, Journal of the American Statistical Association, 111, 514, 800-812 (2016)
[9] Dray, S.; Legendre, P.; Peres-Neto, PR, Spatial modelling: A comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM), Ecological Modelling, 196, 3-4, 483-493 (2006)
[10] Drineas, P.; Mahoney, MW, On the Nyström method for approximating a Gram matrix for improved kernel-based learning, Journal of Machine Learning Research, 6, 2153-2175 (2005) · Zbl 1222.68186
[11] Finley, AO; Sang, H.; Banerjee, S.; Gelfand, AE, Improving the performance of predictive process modeling for large datasets, Computational Statistics & Data Analysis, 53, 2873-2884 (2009) · Zbl 1453.62090
[12] Friedman, J.; Hastie, T.; Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 3, 432-441 (2008) · Zbl 1143.62076
[13] Furrer, R.; Genton, MG; Nychka, D., Covariance tapering for interpolation of large spatial datasets, Journal of Computational and Graphical Statistics, 15, 3, 502-523 (2006)
[14] Gelfand, AE; Diggle, P.; Guttorp, P.; Fuentes, M., Handbook of spatial statistics (2010), Boca Raton: CRC Press, Boca Raton · Zbl 1188.62284
[15] Gelfand, AE; Kim, HJ; Sirmans, CF; Banerjee, S., Spatial modeling with spatially varying coefficient processes, Journal of the American Statistical Association, 98, 462, 387-396 (2003) · Zbl 1041.62041
[16] Genton, MG; Kleiber, W., Cross-covariance functions for multivariate geostatistics, Statistical Science, 30, 2, 147-163 (2015) · Zbl 1332.86010
[17] Goldstein, H., Multilevel statistical models (2011), West Sussex: Wiley, West Sussex · Zbl 1274.62006
[18] Gotway, CA; Young, LJ, Combining incompatible spatial data, Journal of the American Statistical Association, 97, 458, 632-648 (2002) · Zbl 1073.62604
[19] Griffith, DA, Spatial autocorrelation and spatial filtering: Gaining understanding through theory and scientific visualization (2003), Berlin: Springer, Berlin
[20] Griffith, DA, Hidden negative spatial autocorrelation, Journal of Geographical Systems, 8, 4, 335-355 (2006)
[21] Griffith, DA; Chun, Y., Implementing Moran eigenvector spatial filtering for massively large georeferenced datasets, International Journal of Geographical Information Science, 33, 9, 1-15 (2019)
[22] Griffith, DA; Peres-Neto, PR, Spatial modeling in ecology: The flexibility of eigenfunction spatial analyses, Ecology, 87, 10, 2603-2613 (2006)
[23] Heaton, MJ; Datta, A.; Finley, AO; Furrer, R.; Guinness, J.; Guhaniyogi, R.; Gerber, F.; Gramacym, RB; Hammerling, D.; Katzfuss, M.; Lindgren, F.; Nychka, DW; Sun, F.; Zammit-Mangion, A., A case study competition among methods for analyzing large spatial data, Journal of Agricultural, Biological and Environmental Statistics, 24, 1-28 (2018)
[24] Henderson, CR, Best linear unbiased estimation and prediction under a selection model, Biometrics, 31, 2, 423-447 (1975) · Zbl 0335.62048
[25] Henderson, CR, Applications of linear models in animal breeding (1984), Guelph, ON: University of Guelph, Guelph, ON
[26] Hodges, JS, Richly parameterized linear models: additive, time series, and spatial models using random effects (2016), Boca Raton: Chapman and Hall/CRC, Boca Raton
[27] Hox, J.; Balderjahn, I.; Mathar, R.; Schader, M., Multilevel modeling: When and why, Classification, data analysis, and data highways, 147-154 (1998), Berlin: Springer, Berlin
[28] Hughes, J.; Haran, M., Dimension reduction and alleviation of confounding for spatial generalized linear mixed models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75, 1, 139-159 (2013)
[29] Kammann, EE; Wand, MP, Geoadditive models, Journal of the Royal Statistical Society: Series C (Applied Statistics), 52, 1, 1-18 (2003) · Zbl 1111.62346
[30] Katzfuss, M., A multi-resolution approximation for massive spatial datasets, Journal of the American Statistical Association., 112, 517, 201-214 (2017)
[31] Kelejian, HH; Prucha, IR, A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances, The Journal of Real Estate Finance and Economics, 17, 1, 99-121 (1998)
[32] Kneib, T.; Hothorn, T.; Tutz, G., Variable selection and model choice in geoadditive regression models, Biometrics, 65, 2, 626-634 (2009) · Zbl 1167.62096
[33] Krock, M., Kleiber, W., & Becker, S. (2019). Penalized basis models for very large spatial datasets. arXiv:1902.06877.
[34] LeSage, J.; Pace, RK, Introduction to Spatial Econometrics (2009), Boca Raton: Chapman and Hall/CRC, Boca Raton
[35] Li, Z.; Wood, SN, Faster model matrix crossproducts for large generalized linear models with discretized covariates, Statistics and Computing (2019) · Zbl 1436.62352
[36] Lindgren, F.; Rue, H.; Lindström, J., An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 4, 423-498 (2011) · Zbl 1274.62360
[37] Liu, H., Ong, Y. S., Shen, X., & Cai, J. (2018). When Gaussian process meets big data: A review of scalable GPs. arXiv:1807.01065.
[38] Moran, PA, Notes on continuous stochastic phenomena, Biometrika, 37, 1-2, 17-23 (1950) · Zbl 0041.45702
[39] Murakami, D.; Griffith, DA, Random effects specifications in eigenvector spatial filtering: A simulation study, Journal of Geographical Systems, 17, 4, 311-331 (2015)
[40] Murakami, D.; Griffith, DA, Eigenvector spatial filtering for large data sets: Fixed and random effects approaches, Geographical Analysis, 51, 1, 23-49 (2019)
[41] Murakami, D.; Griffith, DA, Spatially varying coefficient modeling for large datasets: Eliminating N from spatial regressions, Spatial Statistics, 30, 39-64 (2019)
[42] Murakami, D.; Lu, B.; Harris, P.; Brunsdon, C.; Charlton, M.; Nakaya, T.; Griffith, DA, The importance of scale in spatially varying coefficient modeling, Annals of the American Association of Geographers, 109, 1, 50-70 (2019)
[43] Murakami, D.; Yoshida, T.; Seya, H.; Griffith, DA; Yamagata, Y., A Moran coefficient-based mixed effects approach to investigate spatially varying relationships, Spatial Statistics, 19, 68-89 (2017)
[44] Rue, H.; Martino, S.; Chopin, N., Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 2, 319-392 (2009) · Zbl 1248.62156
[45] Samuels, ML, Simpson’s paradox and related phenomena, Journal of the American Statistical Association, 88, 421, 81-88 (1993) · Zbl 0771.62003
[46] Stein, ML, Limitations on low rank approximations for covariance matrices of spatial data, Spatial Statistics, 8, 1-19 (2014)
[47] Tiefelsdorf, M.; Griffith, DA, Semiparametric filtering of spatial autocorrelation: The eigenvector approach, Environment and Planning A, 39, 5, 1193-1221 (2007)
[48] Tsutsumi, M.; Seya, H., Measuring the impact of large-scale transportation projects on land price using spatial statistical models, Papers in Regional Science, 87, 3, 385-401 (2008)
[49] Tsutsumi, M.; Seya, H., Hedonic approaches based on spatial econometrics and spatial statistics: Application to evaluation of project benefits, Journal of Geographical Systems, 11, 4, 357-380 (2009)
[50] Wang, C.; Furrer, R., Combining heterogeneous spatial datasets with process-based spatial fusion models: A unifying framework, Arxiv, 1906, 00364 (2019)
[51] Wiesenfarth, M.; Kneib, T., Bayesian geoadditive sample selection models, Journal of the Royal Statistical Society: Series C (Applied Statistics), 59, 3, 381-404 (2010)
[52] Wood, SN, Fast stable direct fitting and smoothness selection for generalized additive models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 3, 495-518 (2008) · Zbl 05563356
[53] Wood, SN, Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 1, 3-36 (2011) · Zbl 1411.62089
[54] Wood, SN, Generalized additive models: An introduction with R (2017), Boca Raton: Chapman and Hall/CRC, Boca Raton
[55] Wood, SN; Goude, Y.; Shaw, S., Generalized additive models for large data sets, Journal of the Royal Statistical Society: Series C (Applied Statistics), 64, 1, 139-155 (2015)
[56] Wood, SN; Li, Z.; Shaddick, G.; Augustin, NH, Generalized additive models for gigadata: Modeling the UK black smoke network daily data, Journal of the American Statistical Association, 112, 519, 1199-1210 (2017)
[57] Zhang, K.; Kwok, JT, Clustered Nyström method for large scale manifold learning and dimension reduction, IEEE Transactions on Neural Networks, 21, 10, 1576-1587 (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.