Lafferty, John; Wasserman, Larry Rodeo: Sparse, greedy nonparametric regression. (English) Zbl 1132.62026 Ann. Stat. 36, No. 1, 28-63 (2008). Summary: We present a greedy method for simultaneously performing local bandwidth selection and variable selection in nonparametric regression. The method starts with a local linear estimator with large bandwidths, and incrementally decreases the bandwidth of variables for which the gradient of the estimator with respect to bandwidth is large. The method, called rodeo (regularization of derivative expectation operator), conducts a sequence of hypothesis tests to threshold derivatives, and is easy to implement. Under certain assumptions on the regression function and sampling density, it is shown that the rodeo applied to local linear smoothing avoids the curse of dimensionality, achieving near optimal minimax rates of convergence in the number of relevant variables, as if these variables were isolated in advance. Cited in 2 ReviewsCited in 36 Documents MSC: 62G08 Nonparametric regression and quantile regression 62G20 Asymptotic properties of nonparametric inference 62G07 Density estimation 62G10 Nonparametric hypothesis testing 62G05 Nonparametric estimation Keywords:nonparametric regression; sparsity; local linear smoothing; bandwidth estimation; variable selection; minimax rates of convergence Software:ElemStatLearn × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042 [2] Bühlmann, P. and Yu, B. (2006). Sparse boosting. J. Mach. Learn. Res. 7 1001-1024. · Zbl 1222.68155 [3] Donoho, D. (2004). For most large underdetermined systems of equations, the minimal \ell 1 -norm near-solution approximates the sparest near-solution. Comm. Pure Appl. Math. 59 797-829. · Zbl 1113.15004 · doi:10.1002/cpa.20132 [4] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067 [5] Fan, J. (1992). Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 87 998-1004. JSTOR: · Zbl 0850.62354 · doi:10.2307/2290637 [6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. JSTOR: · Zbl 1073.62547 · doi:10.1198/016214501753382273 [7] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928-961. · Zbl 1092.62031 · doi:10.1214/009053604000000256 [8] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141. · Zbl 0765.62064 · doi:10.1214/aos/1176347963 [9] Fu, W. and Knight, K. (2000). Asymptotics for lasso type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397 [10] George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339-373. · Zbl 0884.62031 [11] Girosi, F. (1997). An equivalence between sparse approximation and support vector machines. Neural Comput. 10 1455-1480. [12] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression . Springer, Berlin. · Zbl 1021.62024 [13] Hastie, T. and Loader, C. (1993). Local regression: Automatic kernel carpentry. Statist. Sci. 8 120-129. [14] Hastie, T., Tibshirani, R. and Friedman, J. H. (2001). The Elements of Statistical Learning . Data Mining , Inference , and Prediction . Springer, Berlin. · Zbl 0973.62007 [15] Hristache, M., Juditsky, A., Polzehl, J. and Spokoiny, V. (2001). Structure adaptive approach for dimension reduction. Ann. Statist. 29 1537-1566. · Zbl 1043.62052 [16] Kerkyacharian, K., Lepski, O. and Picard, D. (2001). Nonlinear estimation in anisotropic multi-index denoising. Probab. Theory Related Fields 121 137-170. · Zbl 1010.62029 · doi:10.1007/s004400100148 [17] Lawrence, N. D., Seeger, M. and Herbrich, R. (2003). Fast sparse Gaussian process methods: The informative vector machine. In Advances in Neural Information Processing Systems 15 625-632. · Zbl 1157.68431 [18] Lepski, O. V., Mammen, E. and Spokoiny, V. G. (1997). Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 929-947. · Zbl 0885.62044 · doi:10.1214/aos/1069362731 [19] Li, L., Cook, R. D. and Nachsteim, C. (2005). Model-free variable selection. J. Roy. Statist. Soc. Ser. B 67 285-299. · Zbl 1069.62053 · doi:10.1111/j.1467-9868.2005.00502.x [20] Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215-1230. · Zbl 0554.62035 · doi:10.1214/aos/1176346788 [21] Ruppert, D. (1997). Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. J. Amer. Statist. Assoc. 92 1049-1062. JSTOR: · Zbl 1067.62531 · doi:10.2307/2965570 [22] Ruppert, D. and Wand, M. P. (1994). Multivariate locally weighted least squares regression. Ann. Statist. 22 1346-1370. · Zbl 0821.62020 · doi:10.1214/aos/1176325632 [23] Samarov, A., Spokoiny, V. and Vial, C. (2005). Component identification and estimation in nonlinear high-dimensional regression models by structural adaptation. J. Amer. Statist. Assoc. 100 429-445. · Zbl 1117.62419 · doi:10.1198/016214504000001529 [24] Smola, A. and Bartlett, P. (2001). Sparse greedy Gaussian process regression. In Advances in Neural Information Processing Systems 13 619-625. [25] Stone, C. J., Hansen, M. H., Kooperberg, C. and Truong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling (with discussion). Ann. Statist. 25 1371-1470. · Zbl 0924.62036 · doi:10.1214/aos/1031594728 [26] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. JSTOR: · Zbl 0850.62538 [27] Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. J. Machine Learning Research 1 211-244. · Zbl 0997.68109 · doi:10.1162/15324430152748236 [28] Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50 2231-2241. · Zbl 1288.94019 · doi:10.1109/TIT.2004.834793 [29] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Inform. Theory 51 1030-1051. · Zbl 1288.94025 [30] Turlach, B. (2004). Discussion of “Least angle regression” by Efron, Hastie, Jonstone and Tibshirani. Ann. Statist. 32 494-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067 [31] Zhang, H., Wahba, G., Lin, Y., Voelker, M., Ferris, R. K. and Klein, B. (2005). Variable selection and model building via likelihood basis pursuit. J. Amer. Statist. Assoc. 99 659-672. · Zbl 1117.62459 · doi:10.1198/016214504000000593 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.