×

Backfitting in smoothing spline ANOVA. (English) Zbl 0929.62043

Summary: A computational scheme for fitting smoothing spline ANOVA models to large data sets with a (near) tensor product design is proposed. Such data sets are common in spatial-temporal analyses. The proposed scheme uses the backfitting algorithm to take advantage of the tensor product design to save both computational memory and time.
Several ways to further speed up the backfitting algorithm, such as collapsing component functions and successive over-relaxation, are discussed. An iterative imputation procedure is used to handle the cases of near tensor product designs. An application to a global historical surface air temperature data set, which motivated this work, is used to illustrate the scheme proposed.

MSC:

62G07 Density estimation
65C60 Computational problems in statistics (MSC2010)
62H12 Estimation in multivariate analysis
65D10 Numerical smoothing, curve fitting
65F10 Iterative numerical methods for linear systems
86A32 Geostatistics

Software:

gss
Full Text: DOI

References:

[1] Ansley, C. F. and Kohn, R. (1994). Convergence of the backfitting algorithm for additive models. J. Austral. Math. Soc. Ser. A 57 316-329. · Zbl 0813.62030
[2] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337-404. JSTOR: · Zbl 0037.20701 · doi:10.2307/1990404
[3] Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models (with discussion). Ann. Statist. 17 453-555. · Zbl 0689.62029 · doi:10.1214/aos/1176347115
[4] Chen, Z., Gu, C. and Wahba, G. (1989). Discussion of ”Linear smoothers and additive models” by Buja, Hastie and Tibshirani. Ann. Statist. 17 515-517.
[5] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy al Statist. Soc. Ser. B 39 1-38. JSTOR: · Zbl 0364.62022
[6] Girard, D. (1989). A fast ”Monte Carlo cross-validation” procedure for large least squares problems with noisy data. Numer. Math. 56 1-23. · Zbl 0665.65010 · doi:10.1007/BF01395775
[7] Golub, G. H. and Van Loan, C. F. (1989). Matrix Computations, 2nd ed. Johns Hopkins Univ. Press. · Zbl 0733.65016
[8] Green, P. J. (1990). On use of the EM Algorithm for Penalized Likelihood Estimation. J. Roy al Statist. Soc. Ser. B 52 443-452. JSTOR: · Zbl 0706.62022
[9] Gu, C. (1989). RKPACK and its applications: Fitting smoothing spline models. Technical Report 857, Dept. Statistics, Univ. Wisconsin-Madison. Gu, C. and Wahba, G. (1993a). Semiparametric analysis of variance with tensor product thin plate splines. J. Roy al Statist. Soc. Ser. B 55 353-368. Gu, C. and Wahba, G. (1993b). Smoothing spline ANOVA with component-wise Bayesian confidence intervals. J. Comput. Graph. Statist. 2 97-117. · Zbl 0786.62048
[10] Hansen, J. and Lebedeff, S. (1987). Global trends of measured surface air temperature. J. Geophysical Research 92 13,345-13,372. Jones, P. D., Raper, S. C. B., Cherry, B. S. G., Goodess, C. M., Wigley, T. M. L., Santer, B.,
[11] Kelly, P. M., Bradley, R. S. and Diaz, H. F. (1991). An updated global grid point surface air temperature anomaly data set: 1851-1988. Environmental Sciences Division Publication 3520, U.S. Dept. Energy, Washington, DC.
[12] Jones, P. D., Raper, S. C. B., Bradley, R. S., Diaz, H. F., Kelly, P. M. and Wigley, T. M. L. (1986). Northern hemisphere surface air temperature variations: 1851-1984. J. Climate and Applied Meteorology 25 161-179.
[13] Karl, T. R., Knight, R. W. and Christy, J. R. (1994). Global and hemispheric temperature trends: uncertainties related to inadequate spatial sampling. J. Climate 7 1144-1163.
[14] Liu, J. S. (1994). The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Amer. Statist. Assoc. 89 958-966. JSTOR: · Zbl 0804.62033 · doi:10.2307/2290921
[15] Luo, Z. (1996). Backfitting in smoothing spline ANOVA with application to historical global temperature data (thesis). Technical Report 964, Dept. Statistics, Univ. Wisconsin, Madison.
[16] Luo, Z., Wahba, G. and Johnson, D. R. (1998). Spatial-temporal analysis of temperature using smoothing spline ANOVA. J. Climate 11 18-28.
[17] Madden, R. A., Shea, D. J., Branstator, G. W., Tribbia, J. J. and Weber, R. O. (1993). The effects of imperfect spatial and temporal sampling on estimates of the global mean temperature: Experiments with model data. J. Climate 6 1057-1066.
[18] O’Sullivan, F. (1985). Discussion of ”Some aspects of the spline smoothing approach to nonparametric regression curve fitting” by Silverman. J. Roy. Statist. Soc. Ser. B 47 39-40. · Zbl 0606.62038
[19] Roberts, G. O. and Sahu, S. K. (1997). Updating schemes, Correlation structure, blocking and parameterization for the Gibbs sampler. J. Roy. Statist. Soc. Ser. B 59 291-317. JSTOR: · Zbl 0886.62083
[20] Stein, M. (1990). Uniform asy mptotic optimality of linear predictions of a random field using an incorrect second-order structure. Ann. Statist. 18 850-872. · Zbl 0716.62099 · doi:10.1214/aos/1176347629
[21] Tapia, R. A. and Thompson, J. R. (1978). Nonparametric Probability Density Estimation. Johns Hopkins Univ. Press. · Zbl 0449.62029
[22] Varga, R. S. (1962). Matrix Iterative Analy sis. Prentice-Hall, Englewood Cliffs, NJ.
[23] Vinnikov, K. Ya., Groisman, P. Ya. and Lugina, K. M. (1990). Empirical data on contemporary global climate changes (temperature and precipitation). J. Climate 3 662-677.
[24] Wahba, G. (1981). Spline interpolation and smoothing on the sphere. SIAM J. Sci. Statist. Comput. 2 5-16. [Erratum (1982) 3 385-386.] · Zbl 0537.65008
[25] Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia. · Zbl 0813.62001
[26] Wahba, G. and Luo, Z. (1997). Smoothing spline ANOVA fits for very large, nearly regular data sets, with application to historical global climate data. Ann. Numer. Math. 4 579-597. · Zbl 0885.65013
[27] Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995). Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann. Statist. 23 1865-1895. · Zbl 0854.62042 · doi:10.1214/aos/1034713638
[28] Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11 95-103. · Zbl 0517.62035 · doi:10.1214/aos/1176346060
[29] Yates, F. (1933). The analysis of replicated experiments when the field results are incomplete. Empire J. Experimental Agriculture 1 129-142.
[30] Young, D. M. (1971). Iterative Solution of Large Linear Sy stems. Academic Press, New York. · Zbl 0231.65034
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.