×

zbMATH — the first resource for mathematics

Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. (The 1994 Neyman Memorial Lecture). (English) Zbl 0854.62042
Summary: Let \(y_i\), \(i = 1, \dots, n\), be independent observations with the density of \(y_i\) of the form \(h(y_i, f_i) = \exp [y_if_i - b(f_i) + c(y_i)]\), where \(b\) and \(c\) are given functions and \(b\) is twice continuously differentiable and bounded away from 0. Let \(f_i = f(t(i))\), where \(t = (t_1, \dots, t_d) \in {\mathcal T}^{(1)} \otimes \cdots \otimes {\mathcal T}^{(d)} = {\mathcal T}\), the \({\mathcal T}^{(\alpha)}\) are measurable spaces of rather general form and \(f\) is an unknown function on \({\mathcal T}\) with some assumed “smoothness” properties. Given \(\{y_i, t(i), i = 1, \dots, n\}\), it is desired to estimate \(f(t)\) for \(t\) in some region of interest contained in \({\mathcal T}\).
We develop the fitting of smoothing spline ANOVA models to this data of the form \[ f(t) = C + \sum_\alpha f_\alpha (t_\alpha) + \sum_{\alpha < \beta} f_{\alpha \beta} (t_\alpha, t_\beta) + \cdots. \] The components of the decomposition satisfy side conditions which generalize the usual side conditions for parametric ANOVA. The estimate of \(f\) is obtained as the minimizer, in an appropriate function space, of \[ {\mathcal L} (y,f) + \sum_\alpha \lambda_\alpha J_\alpha (f_\alpha) + \sum_{\alpha < \beta} \lambda_{\alpha \beta} J_{\alpha \beta} (f_{\alpha \beta}) + \cdots, \] where \({\mathcal L} (y,f)\) is the negative log likelihood of \(y = (y_1, \dots, y_n)'\) given \(f\), the \(J_\alpha\), \(J_{\alpha \beta}, \dots\) are quadratic penalty functionals and the ANOVA decomposition is terminated in some manner. There are five major parts required to turn this program into a practical data analysis tool:
(1) methods for deciding which terms in the ANOVA decomposition to include (model selection), (2) methods for choosing good values of the smoothing parameters \(\lambda_\alpha\), \(\lambda_{\alpha \beta}, \dots\), (3) methods for making confidence statements concerning the estimate, (4) numerical algorithms for the calculations and, finally, (5) public software.
In this paper we carry out this program, relying on earlier work and filling in important gaps. The overall scheme is applied to Bernoulli data from the Wisconsin Epidemiologic Study of Diabetic Retinopathy to model the risk of progression of diabetic retinopathy as a function of glycosylated hemoglobin, duration of diabetes and body mass index. It is believed that the results have wide practical application to the analysis of data from large epidemiologic studies.

MSC:
62G07 Density estimation
41A15 Spline approximation
62P10 Applications of statistics to biology and medical sciences; meta analysis
65D10 Numerical smoothing, curve fitting
41A63 Multidimensional problems
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] BATES, D. M., LINDSTROM, M. J., WAHBA, G. and YANDELL, B. S. 1987. GCVPACK: routines for generalized cross validation. Comm. Statist. Simulation Comput. 16 263 297. Z. Z · Zbl 0618.62004
[2] BREIMAN, L. 1991. The method for estimating multivariate functions from noisy data with. discussion. Technometrics 33 125 160. Z. JSTOR: · Zbl 0742.62037
[3] BREIMAN, L., FRIEDMAN, J., OLSHEN, R. and STONE, C. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA. Z. · Zbl 0541.62042
[4] CHAMBERS, J. and HASTIE, T. 1992. Statistical Models in S. Wadsworth and Brooks Cole, Belmont, CA. Z. · Zbl 0776.62007
[5] CHEN, Z. 1991. Interaction spline models and their convergence rates. Ann. Statist. 19 1855 1868. · Zbl 0738.62065
[6] CHEN, Z. 1993. Fitting multivariate regression functions by interaction spline models. J. Roy. Statist. Soc. Ser. B 55 473 491. Z. JSTOR: · Zbl 0783.62029
[7] CHEN, Z., GU, C. and WAHBA, G. 1989. Comment on “Linear smoothers and additive models” by A. Buja, T. Hastie and R. Tibshirani. Ann. Statist. 17 515 521. Z. · Zbl 0689.62029
[8] CHENG, B. and TITTERINGTON, D. 1994. Neural networks: a review from a statistical perspective Z. with discussion. Statist. Sci. 9 2 54. Z. · Zbl 0955.62589
[9] COX, D. and CHANG, Y. 1990. Iterated state space algorithms and cross validation for generalized smoothing splines. Technical Report 49, Dept. Statistics, Univ. Illinois, Champaign. Z. Z.
[10] COX, D., KOH, E., WAHBA, G. and YANDELL, B. 1988. Testing the parametric null model Z. hy pothesis in semiparametric partial and generalized spline models. Ann. Statist. 16 113 119. Z. · Zbl 0673.62017
[11] CRAVEN, P. and WAHBA, G. 1979. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377 403. Z. · Zbl 0377.65007
[12] EFRON, B. and STEIN, C. 1981. The jackknife estimate of variance. Ann. Statist. 9 586 596. Z. · Zbl 0481.62035
[13] EFRON, B. and TIBSHIRANI, R. 1993. An Introduction to the Bootstrap. Chapman and Hall, London. Z. · Zbl 0835.62038
[14] EUBANK, R. 1988. Spline Smoothing and Nonparametric Regression. Dekker, New York. Z. · Zbl 0702.62036
[15] FRIEDMAN, J. 1991. Multivariate adaptive regression splines. Ann. Statist. 19 1 141. Z. · Zbl 0765.62064
[16] FRIEDMAN, J. H. and STUETZLE, W. 1981. Projection pursuit regression. J. Amer. Statist. Assoc. 76 817 823. Z. JSTOR:
[17] GEMAN, S., BIENENSTOCK, E. and DOURSAT, R. 1992. Neural networks and the bias variance dilemma. Neural Computation 4 1 58. Z.
[18] GIRARD, D. 1987. A fast “Monte Carlo cross-validation” procedure for large least squares problems with noisy data. Technical Report RR 687-M, IMAG, Grenoble, France. Z. · Zbl 0665.65010
[19] GIRARD, D. 1989. A fast “Monte-Carlo cross-validation” procedure for large least squares problems with noisy data. Numer. Math. 56 1 23. Z. · Zbl 0665.65010
[20] GIRARD, D. 1991. Asy mptotic optimality of the fast randomized versions of GCV and C in ridge L regression and regularization. Ann. Statist. 19 1950 1963. Z. · Zbl 0754.62030
[21] GOLUB, G. and VON MATT, U. 1995. Generalized cross-validation in large scale problems. Technical Report, Scientific Computing Computational Mathematics Program, Stanford Univ. To appear. Z.
[22] GREEN, P. and SILVERMAN, B. 1994. Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London. Z. · Zbl 0832.62032
[23] GREEN, P. and YANDELL, B. 1985. Semi-Parametric Generalized Linear Models. Lecture Notes in Statist. 32 44 55. Springer, Berlin. Z.
[24] GU, C. 1989. RKPACK and its applications: fitting smoothing spline models. In Proceedings of Z the Statistical Computing Section 42 51. Amer. Statist. Assoc., Alexandria, VA. Code. available through netlib. Z.
[25] GU, C. 1990. Adaptive spline smoothing in non-Gaussian regression models. J. Amer. Statist. Assoc. 85 801 807. Z. JSTOR:
[26] GU, C. 1992a. Cross-validating non-Gaussian data. Journal of Computational and Graphical Statistics 1 169 179. Z.
[27] GU, C. 1992b. Diagnostics for nonparametric regression models with additive terms. J. Amer. Statist. Assoc. 87 1051 1057. Z.
[28] GU, C. 1992c. Penalized likelihood regression: a Bayesian analysis. Statist. Sinica 2 255 264. Z. · Zbl 0822.62023
[29] GU, C., BATES, D., CHEN, Z. and WAHBA, G. 1989. The computation of GCV functions through Householder tridiagonalization with application to the fitting of interaction spline models. SIAM J. Matrix Anal. Appl. 10 457 480. Z. · Zbl 0685.65134
[30] GU, C. and QIU, C. 1994. Penalized likelihood regression: a simple asy mptotic analysis. Statist. Sinica 4 297 304. Z. · Zbl 0823.62050
[31] GU, C. and WAHBA, G. 1991a. Comments on “Multivariate adaptive regression splines” by J. Friedman. Ann. Statist. 19 115 123. Z. · Zbl 0765.62064
[32] GU, C. and WAHBA, G. 1991b. Minimizing GCV GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12 383 398. · Zbl 0727.65009
[33] GU, C. and WAHBA, G. 1993a. Semiparametric analysis of variance with tensor product thin plate splines. J. Roy. Statist. Soc. Ser. B 55 353 368. Z. JSTOR: · Zbl 0786.62048
[34] GU, C. and WAHBA, G. 1993b. Smoothing spline ANOVA with component-wise Bayesian “confidence intervals.” Journal of Computational and Graphical Statistics 2 97 117. Z. JSTOR:
[35] HASTIE, T. and TIBSHIRANI, R. 1990. Generalized Additive Models. Chapman and Hall, London. Z. · Zbl 0747.62061
[36] HASTIE, T. and TIBSHIRANI, R. 1993. Varying-coefficient models. J. Roy. Statist. Soc. Ser. B 55 757 796. Z. JSTOR: · Zbl 0796.62060
[37] HUDSON, M. 1978. A natural identity for exponential families with applications in multiparameter estimation. Ann. Statist. 6 473 484. Z. · Zbl 0391.62006
[38] HUTCHINSON, M. 1984. A summary of some surface fitting and contouring programs for noisy data. Technical Report ACT 84 6, CSIRO Division of Mathematics and Statistics, Canberra. Z.
[39] HUTCHINSON, M. 1989. A stochastic estimator for the trace of the influence matrix for Laplacian smoothing splines. Comm. Statist. Simulation Comput. 18 1059 1076. Z. · Zbl 0695.62113
[40] HUTCHINSON, M. and GESSLER, P. 1994. Splines more than just a smooth interpolator. Geoderma 62 45 67.
[41] KLEIN, B. E. K., DAVIS, M. D., SEGAL, P., LONG, J. A., HARRIS, W. A., HAUG, G. A., MAGLI, Y. and Z.
[42] Sy RJALA, S. 1984. Diabetic retinopathy: assessment of severity and progression. Ophthalmology 91 10 17. Z.
[43] KLEIN, R., KLEIN, B. E. K., MOSS, S. E. and CRUICKSHANKS, K. J. 1994a. The relationship of hy pergly cemia to long-term incidence and progression of diabetic retinopathy. Archives of Internal Medicine 154 2169 2178. Z.
[44] KLEIN, R., KLEIN, B. E. K., MOSS, S. E. and CRUICKSHANKS, K. J. 1994b. The Wisconsin Epidemiologic Study of Diabetic Retinopathy. XIV. Ten year incidence and progression of diabetic retinopathy. Archives of Ophthalmology 112 1217 1228. Z.
[45] KLEIN, R., KLEIN, B. E. K., MOSS, S. E., DAVIS, M. D. and DEMETS, D. L. 1984a. The Wisconsin Epidemiologic Study of Diabetic Retinopathy. II. Prevalence and risk of diabetic retinopathy when age at diagnosis is less than 30 years. Archives of Ophthalmology 102 520 526. Z.
[46] KLEIN, R., KLEIN, B. E. K., MOSS, S. E., DAVIS, M. D. and DEMETS, D. L. 1984b. The Wisconsin Epidemiologic Study of Diabetic Retinopathy. III. Prevalence and risk of diabetic retinopathy when age at diagnosis is 30 or more years. Archives of Ophthalmology 102 527 532. Z.
[47] KLEIN, R., KLEIN, B. E. K., MOSS, S. E., DAVIS, M. D. and DEMETS, D. L. 1988. Gly cosy lated hemoglobin predicts the incidence and progression of diabetic retinopathy. Journal of the American Medical Association 260 2864 2871. Z.
[48] KLEIN, R., KLEIN, B. E. K., MOSS, S. E., DAVIS, M. D. and DEMETS, D. L. 1989a. Is blood pressure a predictor of the incidence or progression of diabetic retinopathy? Archives of Internal Medicine 149 2427 2432. Z.
[49] KLEIN, R., KLEIN, B. E. K., MOSS, S. E., DAVIS, M. D. and DEMETS, D. L. 1989b. The Wisconsin Epidemiologic Study of Diabetic Retinopathy. IX. Four year incidence and progression of diabetic retinopathy when age at diagnosis is less than 30 years. Archives of Ophthalmology 107 237 243. Z.
[50] KLEIN, R., KLEIN, B. E. K., MOSS, S. E., DAVIS, M. D. and DEMETS, D. L. 1989c. The Wisconsin Epidemiologic Study of Diabetic Retinopathy. X. Four year incidence and progression of diabetic retinopathy when age at dianosis is 30 or more years. Archives of Ophthalmology 107 244 249. Z.
[51] KLEIN, R., KLEIN, B. E. K., MOSS, S. E., DEMETS, D. L., KAUFFMAN, I. and VOSS, P. S. 1984. Prevalence of diabetes mellitus in southern Wisconsin. American Journal of Epidemiology 119 54 61. Z.
[52] LI, K. C. 1985. From Stein’s unbiased risk estimates to the method of generalized cross-validation. Ann. Statist. 13 1352 1377. Z. · Zbl 0605.62047
[53] LI, K. C. 1986. Asy mptotic optimality of C and generalized cross validation in ridge regression L with application to spline smoothing. Ann. Statist. 14 1101 1112. Z. · Zbl 0629.62043
[54] LIU, Y. 1993. Unbiased estimate of generalization error and model selection in neural network. Unpublished manuscript, Institute of Brain and Neural Sy stems, Dept. physics, Brown Univ.
[55] LUO, Z. and WAHBA G. 1995. Hy brid adaptive splines. Technical Report 947, Dept. Statistics, Univ. Wisconsin, Madison. Z.
[56] MALLOWS, C. 1973. Some comments on C. Technometrics 15 661 675. p Z. · Zbl 0269.62061
[57] MCCULLAGH, P. and NELDER, J. 1989. Generalized Linear Models, 2nd ed. Chapman and Hall, London. Z. · Zbl 0744.62098
[58] MOODY, J. 1991. The effective number of parameters: an analysis of generalization and regularization in nonlinear learning sy stems. In Advances in Neural Information Z. Processing Sy stems 4 J. Moody, S. Hanson and R. Lippman, eds. 847 854. Kaufmann, San Mateo, CA.Z.
[59] NELDER, J. and WEDDERBURN, R. 1972. Generalized linear models. J. Roy. Statist. Soc. Ser. A 35 370 384. Z.
[60] Ny CHKA, D. 1988. Bayesian confidence intervals for smoothing splines. J. Amer. Statist. Assoc. 83 1134 1143. Z. JSTOR:
[61] Ny CHKA, D. 1990. The average posterior variance of a smoothing spline and a consistent estimate of the average squared error. Ann. Statist. 18 415 428. Z. · Zbl 0731.62084
[62] Ny CHKA, D., WAHBA, G., GOLDFARB, S. and PUGH, T. 1984. Cross-validated spline methods for the estimation of three dimensional tumor size distributions from observations on two dimensional cross sections. J. Amer. Statist. Assoc. 79 832 846. Z. O’SULLIVAN, F. 1983. The analysis of some penalized likelihood estimation schemes. Ph.D. dissertation, Technical Report 726, Dept. Statistics, Univ. Wisconsin Madison. Z. O’SULLIVAN, F. 1990. An iterative approach to two-dimensional Laplacian smoothing with application to image restoration. J. Amer. Statist. Assoc. 85 213 219. Z. O’SULLIVAN, F., YANDELL, B. and RAy NOR, W. 1986. Automatic smoothing of regression functions in generalized linear models. J. Amer. Statist. Assoc. 81 96 103. Z. JSTOR: · Zbl 0572.62085
[63] RAGHAVAN, N. 1993. Bayesian inference in nonparametric logistic regression. Ph.D. dissertation Univ. Illinois, Urbana Champaign. Z.
[64] RIPLEY, B. 1994. Neural networks and related methods for classification. J. Roy. Statist. Soc. Ser. B 56 409 456. Z. JSTOR: · Zbl 0815.62037
[65] ROOSEN, C. and HASTIE, T. 1994. Automatic smoothing spline projection pursuit. Journal of Computational and Graphical Statistics 3 235 248. Z. SAS INSTITUTE 1989. SAS STAT User’s Guide, Version 6, 4th ed. SAS Institute, Inc., Cary, North Carolina. Z.
[66] SHIAU, J. J., WAHBA, G. and JOHNSON, D. 1986. Partial spline models for the inclusion of tropopause and frontal boundary information. Journal of Atmospheric and Oceanic Technology 3 714 725. Z.
[67] STONE, C. 1994. The use of poly nomial splines and their tensor products in multivariate Z. function estimation with discussion. Ann. Statist. 22 118 184. Z. · Zbl 0827.62038
[68] WAHBA, G. 1978. Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. Roy. Statist. Soc. Ser. B 40 364 372. Z. JSTOR: · Zbl 0407.62048
[69] WAHBA, G. 1980. Spline bases, regularization, and generalized cross validation for solving approximation problems with large quantities of noisy data. In Approximation Theory Z. III W. Cheney, ed. 905 912. Academic Press, New York. Z. · Zbl 0485.41012
[70] WAHBA, G. 1981. Spline interpolation and smoothing on the sphere. SIAM J. Sci. Statist. Comput. 2 5 16. Z. · Zbl 0537.65008
[71] WAHBA, G. 1982. Erratum: spline interpolation and smoothing on the sphere. SIAM J. Sci. Statist. Comput. 3 385 386. Z. · Zbl 0537.65009
[72] WAHBA, G. 1983. Bayesian “confidence intervals” for the cross-validated smoothing spline. J. Roy. Statist. Soc. Ser. B 45 133 150. Z. JSTOR: · Zbl 0538.65006
[73] WAHBA, G. 1990. Spline Models for Observational Data. SIAM, Philadelphia. Z. · Zbl 0813.62001
[74] WAHBA, G. 1992. Multivariate function and operator estimation, based on smoothing splines and reproducing kernels. In Nonlinear Modeling and Forecasting. Santa Fe Institute Z. Studies in the Sciences of Complexity, Proceedings M. Casdagli and S. Eubank, eds. 12 95 112. Addison-Wesley, Reading, MA. Z.
[75] WAHBA, G. 1995. Generalization and regularization in nonlinear learning sy stems. In HandZ. book of Brain Theory and Neural Networks M. Arbib, ed. 426 430. MIT Press.
[76] WAHBA, G., GU, C., WANG, Y. and CHAPPELL, R. 1995. Soft classification, a.k.a. risk estimation, via penalized log likelihood and smoothing spline analysis of variance. In The Mathematics of Generalization. Santa Fe Institute Studies in the Sciences of Complexity, Z. Proceedings D. Wolpert, ed. 20 329 360. Addison-Wesley, Reading, MA. Z. · Zbl 0861.68079
[77] WAHBA, G., JOHNSON, D., GAO, F. and GONG, J. 1994. Adaptive tuning of numerical weather prediction models: randomized GCV in three and four dimensional data assimilation. Monthly Weather Review 123 3358 3369. Z.
[78] WAHBA, G. and WENDELBERGER, J. 1980. Some new mathematical methods for variational objective analysis using splines and cross-validation. Monthly Weather Review 108 1122 1145. Z.
[79] WANG, Y. 1994. Smoothing spline analysis of variance of data from exponential families. Ph.D. dissertation, Technical Report 928, Univ. Wisconsin Madison. Z.
[80] WANG, Y. 1995. GRKPACK: fitting smoothing spline analysis of variance models to data from exponential families. Technical Report 942, Dept. Statistics, Univ. Wisconsin Madison. Z.
[81] WANG, Y. and WAHBA, G. 1995. Bootstrap confidence intervals for smoothing splines and their comparison to Bayesian “confidence intervals.” J. Statist. Comput. Simulation. 51 263 280. Z. · Zbl 0842.62036
[82] WANG, Y., WAHBA, G., CHAPPELL, R. and GU, C. 1995. Simulation studies of smoothing parameter estimates and Bayesian confidence intervals in Bernoulli SS-ANOVA models. Comm. Statist. Simulation Comput. To appear. Z. · Zbl 0850.62340
[83] WEBER, R. and TALKNER, P. 1993. Some remarks on spatial correlation function models. Monthly Weather Review 121 2611 2617. Z.
[84] WONG, W. 1992. Estimation of the loss of an estimate. Technical Report 356, Dept. Statistics, Univ. Chicago. Z.
[85] XIANG, D. and WAHBA, G. 1995. Testing the generalized linear model Null Hy pothesis versus “smooth” alternatives. Technical Report 953, Dept. Statistics Univ. Wisconsin Madison. Z.
[86] YANDELL, B. 1986. Algorithms for nonlinear generalized cross-validation. In Computer Science Z. and Statistics: 18th Sy mposium on the Interface T. Boardman, ed.. Amer. Statist. Assoc., Washington, DC.
[87] MADISON, WISCONSIN 53706 1420 WASHINGTON HEIGHTS ANN ARBOR, MICHIGAN 48109
[88] CHONG GU RONALD KLEIN, MD DEPARTMENT OF STATISTICS BARBARA KLEIN, MD PURDUE UNIVERSITY DEPARTMENT OF OPHTHALMOLOGY MATH SCIENCES BUILDING UNIVERSITY OF WISCONSIN WEST LAFAy ETTE, INDIANA 47907 610 NORTH WALNUT STREET MADISON, WISCONSIN 53705
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.