×

Fixed rank Kriging for very large spatial data sets. (English) Zbl 05563351

Summary: Spatial statistics for very large spatial data sets is challenging. The size of the data set, \(n\), causes problems in computing optimal spatial predictors such as kriging, since its computational cost is of order \(n^3\). In addition, a large data set is often defined on a large spatial domain, so the spatial process of interest typically exhibits non-stationary behaviour over that domain. A flexible family of non-stationary covariance functions is defined by using a set of basis functions that is fixed in number, which leads to a spatial prediction method that we call fixed rank kriging. Specifically, fixed rank kriging is kriging within this class of non-stationary covariance functions. It relies on computational simplifications when \(n\) is very large, for obtaining the spatial best linear unbiased predictor and its mean-squared prediction error for a hidden spatial process. A method based on minimizing a weighted Frobenius norm yields best estimators of the covariance function parameters, which are then substituted into the fixed rank kriging equations. The new methodology is applied to a very large data set of total column ozone data, observed over the entire globe, where \(n\) is of the order of hundreds of thousands.

MSC:

62-XX Statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Adler, The Geometry of Random Fields (1981) · Zbl 0478.60059
[2] Billings, Interpolation of geophysical data using continuous global surfaces, Geophysics 67 pp 1810– (2002a)
[3] Billings, Smooth fitting of geophysical data using continuous global surfaces, Geophysics 67 pp 1823– (2002b)
[4] Cressie, Fitting variogram models by weighted least squares, J. Int. Ass. Math. Geol. 17 pp 563– (1985)
[5] Cressie, Geostatistics, Am. Statistn 43 pp 197– (1989)
[6] Cressie, The origins of kriging, Math. Geol. 22 pp 239– (1990) · Zbl 0964.86511
[7] Cressie, Statistics for Spatial Data (1993)
[8] Cressie , N. Johannesson , G. 2006 Spatial prediction of massive datasets. In Proc. Australian Academy of Science Elizabeth and Frederick White Conf 1 11 Canberra: Australian Academy of Science
[9] Diggle, Model-based geostatistics, Appl. Statist. 47 pp 299– (1998) · Zbl 0904.62119
[10] Donoho , D. L. Mallet , S. Von Sachs , R. 1998 Estimating covariances of locally stationary processes: rates of convergence of best basis methods. Technical Report 517 Stanford University
[11] Fuentes, Approximate likelihoods for large irregularly spaced spatial data, J. Am. Statist. Ass. 102 pp 321– (2007) · Zbl 1284.62589
[12] Furrer, Covariance tapering for interpolation of large spatial datasets, J. Computnl Graph. Statist. 15 pp 502– (2006)
[13] Haas, Local prediction of a spatio-temporal process with an application to wet sulfate deposition, J. Am. Statist. Ass. 90 pp 1189– (1995) · Zbl 0864.62063
[14] Hastie, Pseudosplines, J. R. Statist. Soc. B 58 pp 379– (1996)
[15] Hastie, Elements of Statistical Learning: Data Mining, Inference, and Prediction (2001) · Zbl 0973.62007
[16] Henderson, On deriving the inverse of a sum of matrices, SIAM Rev. 23 pp 53– (1981)
[17] Hrafnkelsson, Hierarchical modeling of count data with application to nuclear fall-out, Environ. Ecol. Statist. 10 pp 179– (2003)
[18] Huang, Fast, resolution-consistent spatial prediction of global processes from satellite data, J. Computnl Graph. Statist. 11 pp 63– (2002)
[19] Johannesson, geoENV IV-Geostatistics for Environmental Applications pp 319– (2004a)
[20] Johannesson, Finding large-scale spatial trends in massive, global, environmental datasets, Environmetrics 15 pp 1– (2004b)
[21] Johannesson, Dynamic multi-resolution spatial models, Environ. Ecol. Statist. 14 pp 5– (2007)
[22] Journel, Mining Geostatistics (1978)
[23] Kammann, Geoadditive models, Appl. Statist. 52 pp 1– (2003)
[24] London, Ozone in the Free Atmosphere pp 11– (1985)
[25] Madrid, The Nimbus-7 User’s Guide (1978)
[26] Matheron, Traite de Geostatistique Appliqueé (1962)
[27] Matheron, Principles of geostatistics, Econ. Geol. 58 pp 1246– (1963)
[28] McPeters, The Nimbus-7 Total Ozone Mapping Spectrometer (TOMS) Data Products User’s Guide (1996)
[29] Nychka, Smoothing and Regression: Approaches, Computation, and Application pp 393– (2000)
[30] Nychka, FUNFITS: Data Analysis and Statistical Tools for Estimating Functions (1996) · Zbl 0956.62003
[31] Nychka, Multiresolution models for nonstationary spatial covariance functions, Statist. Modllng 2 pp 315– (2002) · Zbl 1195.62146
[32] Quiñonero-Candela, A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6 pp 1939– (2005) · Zbl 1222.68282
[33] Rue, Fitting Gaussian Markov random fields to Gaussian fields, Scand. J. Statist. 29 pp 31– (2002) · Zbl 1017.62088
[34] Sahr, DGGRID Version 3.1b: User Documentation for Discrete Global Grid Generation Software (2001)
[35] Shi, Global statistical analysis of MISR aerosol data: a massive data product from NASA’s Terra satellite, Environmetrics 18 pp 665– (2007)
[36] Stein, A modeling approach for large spatial datasets, J. Kor. Statist. Soc. 37 (2008) · Zbl 1196.62123
[37] Stroud, Dynamic models for spatiotemporal data, J. R. Statist. Soc. B 63 pp 673– (2001) · Zbl 0986.62074
[38] Tzeng, A fast, optimal spatial-prediction method for massive datasets, J. Am. Statist. Ass. 100 pp 1343– (2005) · Zbl 1117.62436
[39] Vidakovic, Statistical Modeling by Wavelets (1999)
[40] Wahba, Spline Models for Observational Data (1990) · Zbl 0813.62001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.