×

Component selection and smoothing in multivariate nonparametric regression. (English) Zbl 1106.62041

Summary: We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The “COSSO” is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components.
We give an equivalent formulation of the COSSO estimator which leads naturally to an iterative algorithm. We compare the COSSO with MARS, a popular method that builds functional ANOVA models, in simulations and real examples. The COSSO method can be extended to classification problems and we compare its performance with those of a number of machine learning algorithms on real datasets. The COSSO gives very competitive performance in these studies.

MSC:

62G08 Nonparametric regression and quantile regression
62J10 Analysis of variance and covariance (ANOVA)
65C60 Computational problems in statistics (MSC2010)
62G20 Asymptotic properties of nonparametric inference

Software:

gss; PDCO

References:

[1] Breiman, L. (1995). Better subset selection using the nonnegative garrote. Technometrics 37 373–384. JSTOR: · Zbl 0862.62059 · doi:10.2307/1269730
[2] Chen, S., Donoho, D. and Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61. · Zbl 0919.94002 · doi:10.1137/S1064827596304010
[3] Chen, Z. (1993). Fitting multivariate regression functions by interaction spline models. J. Roy. Statist. Soc. Ser. B 55 473–491. JSTOR: · Zbl 0783.62029
[4] Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 31 377–403. · Zbl 0377.65007 · doi:10.1007/BF01404567
[5] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360. JSTOR: · Zbl 1073.62547 · doi:10.1198/016214501753382273
[7] Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109–148. · Zbl 0775.62288 · doi:10.2307/1269656
[8] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1–141. · Zbl 0765.62064 · doi:10.1214/aos/1176347963
[9] Gu, C. (1992). Diagnostics for nonparametric regression models with additive terms. J. Amer. Statist. Assoc. 87 1051–1058.
[10] Gu, C. (2002). Smoothing Spline ANOVA Models . Springer, Berlin. · Zbl 1051.62034
[11] Gunn, S. R. and Kandola, J. S. (2002). Structural modeling with sparse kernels. Machine Learning 48 137–163. · Zbl 0998.68119 · doi:10.1023/A:1013903804720
[12] Shen, X., Huang, H. and Ye, J. (2004). Inference after model selection. J. Amer. Statist. Assoc. 99 751–762. · Zbl 1117.62423 · doi:10.1198/016214504000001097
[13] Shen, X. and Ye, J. (2002). Adaptive model selection. J. Amer. Statist. Assoc. 97 210–221. JSTOR: · Zbl 1073.62509 · doi:10.1198/016214502753479356
[14] Tapia, R. and Thompson, J. (1978). Nonparametric Probability Density Estimation . Johns Hopkins Univ. Press, Baltimore. · Zbl 0449.62029
[15] Tibshirani, R. J. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288. JSTOR: · Zbl 0850.62538
[16] Utreras, F. (1983). Natural spline functions: Their associated eigenvalue problem. Numer. Math. 42 107–117. · Zbl 0522.41011 · doi:10.1007/BF01400921
[17] van de Geer, S. (2000). Empirical Processes in M-Estimation . Cambridge Univ. Press. · Zbl 1179.62073
[18] van Gestel, T., Suykens, J. A. K., Baesens, B., Viaene, S., Vanthienen, J., Dedene, G., de Moor, B. and Vandewalle, J. (2004). Benchmarking least squares support vector machine classifiers. Machine Learning 54 5–32. · Zbl 1078.68737 · doi:10.1023/B:MACH.0000008082.80494.e0
[19] Wahba, G. (1990). Spline Models for Observational Data . SIAM, Philadelphia. · Zbl 0813.62001
[20] Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995). Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. Ann. Statist. 23 1865–1895. · Zbl 0854.62042 · doi:10.1214/aos/1034713638
[21] Yau, P., Kohn, R. and Wood, S. (2003). Bayesian variable selection and model averaging in high-dimensional multinomial nonparametric regression. J. Comput. Graph. Statist. 12 23–54. · doi:10.1198/1061860031301
[22] Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120–131. JSTOR: · Zbl 0920.62056 · doi:10.2307/2669609
[23] Zhang, H. H., Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R. and Klein, B. (2004). Variable selection and model building via likelihood basis pursuit. J. Amer. Statist. Assoc. 99 659–672. · Zbl 1117.62459 · doi:10.1198/016214504000000593
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.