×

zbMATH — the first resource for mathematics

High-dimensional additive modeling. (English) Zbl 1360.62186
Summary: We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for high dimensional but sparse additive models. Finally, an adaptive version of our sparsity-smoothness penalized approach yields large additional performance gains.

MSC:
62G08 Nonparametric regression and quantile regression
62J07 Ridge regression; shrinkage estimators (Lasso)
Software:
hgam
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Agmon, S. (1965). Lectures on Elliptic Boundary Value Problems . Van Nostrand, Princeton, NJ. · Zbl 0142.37401
[2] Baraud, Y. (2002). Model selection for regression on a random design. ESAIM Probab. Stat. 6 127-146. · Zbl 1059.62038 · doi:10.1051/ps:2002007 · numdam:PS_2002__6__127_0 · eudml:244664
[3] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[4] Bousquet, O. (2002). A Bennet concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495-550. · Zbl 1001.60021 · doi:10.1016/S1631-073X(02)02292-6
[5] Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statist. Sci. 22 477-505. · Zbl 1246.62163
[6] Bühlmann, P., Kalisch, M. and Maathuis, M. (2009). Variable selection for high-dimensional models: Partially faithful distributions and the PC-simple algorithm. Technical report, ETH Zürich. · Zbl 1233.62135
[7] Bühlmann, P. and Yu, B. (2003). Boosting with the L2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324-339. · Zbl 1041.62029 · doi:10.1198/016214503000125
[8] Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Aggregation and sparsity via \ell 1 -penalized least squares. In Learning Theory. Lecture Notes in Computer Science 4005 379-391. Springer, Berlin. · Zbl 1143.62319 · doi:10.1007/11776420_29
[9] Bunea, F., Tsybakov, A. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the lasso. Electron. J. Stat. 1 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[10] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523 · euclid:aos/1201012958
[11] Conlon, E. M., Liu, X. S., Lieb, J. D. and Liu, J. S. (2003). Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Nat. Acad. Sci. U.S.A. 100 3339-3344.
[12] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high-dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849-911.
[13] Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Monographs on Statistics and Applied Probability 58 . Chapman and Hall, London. · Zbl 0832.62032
[14] Greenshtein, E. and Ritov, Y. (2004). Persistency in high-dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[15] Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and Semiparametric Models . Springer, New York. · Zbl 1059.62032
[16] Kim, Y., Kim, J. and Kim, Y. (2006). Blockwise sparse regression. Statist. Sinica 16 375-390. · Zbl 1096.62076
[17] Koltchinskii, V. and Yuan, M. (2008). Sparse recovery in large ensembles of kernel machines. In COLT (R. A. Servedio and T. Zhang, eds.) 229-238. Omnipress, Madison, WI.
[18] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes . Springer, Berlin. · Zbl 0748.60004
[19] Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272-2297. · Zbl 1106.62041 · doi:10.1214/009053606000000722
[20] Liu, X. S., Brutlag, D. L. and Liu, J. S. (2002). An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology 20 835-839.
[21] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53-71. · Zbl 1400.62276
[22] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374-393. · Zbl 1452.62522
[23] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[24] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246-270. · Zbl 1155.62050 · doi:10.1214/07-AOS582 · www.projecteuclid.org
[25] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the lasso and its dual. J. Comput. Graph. Statist. 9 319-337. JSTOR: · doi:10.2307/1390657 · links.jstor.org
[26] Ravikumar, P., Liu, H., Lafferty, J. and Wasserman, L. (2008). Spam: Sparse additive models. In Advances in Neural Information Processing Systems 20 (J. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 1201-1208. MIT Press, Cambridge, MA.
[27] Sardy, S. and Tseng, P. (2004). Amlet, Ramlet, and Gamlet: Automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Statist. 13 283-309. JSTOR: · doi:10.1198/1061860043434 · links.jstor.org
[28] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538 · links.jstor.org
[29] van de Geer, S. (2000). Empirical Processes in M-Estimation . Cambridge Univ. Press, Cambridge. · Zbl 0953.62049
[30] van de Geer, S. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[31] van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes . Springer, New York. · Zbl 0862.60002
[32] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[33] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[34] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.