Greenshtein, Eitan Best subset selection, persistence in high-dimensional statistical learning and optimization under \(l_1\) constraint. (English) Zbl 1106.62022 Ann. Stat. 34, No. 5, 2367-2386 (2006). Summary: Let \((Y,X_1,\dots,X_m)\) be a random vector. It is desired to predict \(Y\) based on \((X_1,\dots,X_m)\). Examples of prediction methods are regression, classification using logistic regression or separating hyperplanes, and so on.We consider the problem of best subset selection, and study it in the context \(m=n^\alpha\), \(\alpha>1\), where \(n\) is the number of observations. We investigate procedures that are based on empirical risk minimization. It is shown, that in common cases, we should aim to find the best subset among those of size which is of order \(o(n/\log(n))\). It is also shown, that in some “asymptotic sense,” when assuming a certain sparsity condition, there is no loss in letting \(m\) be much larger than \(n\), for example, \(m=n^\alpha\), \(\alpha >1\). This is in comparison to starting with the “best” subset of size smaller than \(n\) and regardless of the value of \(\alpha\).We then study conditions under which empirical risk minimization subject to \(l_1\) constraints yields nearly the best subset. These results extend some recent results obtained by the author and Y. Ritov [Bernoulli 10, No. 6, 971–988 (2004; Zbl 1055.62078)]. Finally we present a high-dimensional simulation study of a “boosting type” classification procedure. Cited in 42 Documents MSC: 62F07 Statistical ranking and selection procedures 62C99 Statistical decision theory Keywords:Lasso procedure Citations:Zbl 1055.62078 Software:PDCO × Cite Format Result Cite Review PDF Full Text: DOI arXiv References: [1] Bickel, P. and Levina, E. (2004). Some theory of Fisher’s linear discriminant function, “naive Bayes,” and some alternatives where there are many more variables than observations. Bernoulli 10 989–1010. · Zbl 1064.62073 · doi:10.3150/bj/1106314847 [2] Breiman, L. (2001). Statistical modeling: The two cultures (with discussion). Statist. Sci. 16 199–231. · Zbl 1059.62505 · doi:10.1214/ss/1009213726 [3] Breiman, L. (2004). Population theory for boosting ensembles. Ann. Statist. 32 1–11. · Zbl 1105.62308 · doi:10.1214/aos/1079120126 [4] Bühlmann, P. and Bin, Y. (2004). Discussion of boosting papers. Ann. Statist. 32 96–101. · Zbl 1105.62310 [5] Chen, S., Donoho, D. and Saunders, M. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129–159. JSTOR: · Zbl 0979.94010 · doi:10.1137/S003614450037906X [6] Donoho, D. (2004). For most large underdetermined systems of linear equations of minimal \(l^1\)-norm solution is also the sparsest solution. Technical Report 2004-9, Dept. Statistics, Stanford Univ. · Zbl 1113.15004 [7] Donoho, D. (2004). For most large undetermined systems of equations, the minimal \(l^1\)-norm near-solution approximates the sparsest near-solution. Technical Report 2004-10, Dept. Statistics, Stanford Univ. [8] Efron, B., Johnstone, I., Hastie, T. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499. · Zbl 1091.62054 · doi:10.1214/009053604000000067 [9] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360. JSTOR: · Zbl 1073.62547 · doi:10.1198/016214501753382273 [10] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961. · Zbl 1092.62031 · doi:10.1214/009053604000000256 [11] Friedman, J., Hastie, T., Rosset, S., Tibshirani, R. and Zhu, J. (2004). Discussion of boosting papers. Ann. Statist. 32 102–107. · Zbl 1105.62314 [12] Greenshtein, E. (2005). Prediction, model selection and random dimension penalties. Sankhyā 67 46–73. · Zbl 1192.62165 [13] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional predictor selection and the virtue of overparametrization. Bernoulli 10 971–988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846 [14] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Data Mining , Interence and Prediction. Springer, New York. · Zbl 0973.62007 [15] Huber, P. (1973). Robust regression: Asymptotics, conjectures, and Monte Carlo. Ann. Statist. 1 799–821. · Zbl 0289.62033 · doi:10.1214/aos/1176342503 [16] Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric regression. Ann. Statist. 28 681–712. · Zbl 1105.62338 · doi:10.1214/aos/1015951994 [17] Lee, W. S., Bartlett, P. L. and Williamson, R. C. (1996). Efficient agnostic learning of neural networks with bounded fan-in. IEEE Trans. Inform. Theory 42 2118–2132. · Zbl 0874.68253 · doi:10.1109/18.556601 [18] Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 30–55. · Zbl 1105.62319 · doi:10.1214/aos/1079120129 [19] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281 [20] Nemirovski, A. and Yudin, D. (1983). Problem Complexity and Method Efficiency in Optimization. Wiley, New York. · Zbl 0501.90062 [21] Nguyen, D. V., Arpat, A. B., Wang, N. and Carroll, R. J. (2002). DNA microarray experiments: Biological and technological aspects. Biometrics 58 701–717. JSTOR: · Zbl 1210.62197 · doi:10.1111/j.0006-341X.2002.00701.x [22] Pisier, G. (1981). Remarques sur un résultat non publié de B. Maurey. In Seminaire d ’ Analyse Fonctionelle 112 . École Polytechnique, Palaiseau. · Zbl 0491.46017 [23] Portnoy, S. (1984). Asymptotic behavior of M -estimators of \(p\) regression parameters when \(p^2/n\) is large. I. Consistency. Ann. Statist. 12 1298–1309. · Zbl 0584.62050 · doi:10.1214/aos/1176346793 [24] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288. JSTOR: · Zbl 0850.62538 [25] Vapnik, N. V. (1998). Statistical Learning Theory. Wiley, New York. · Zbl 0935.62007 [26] Yohai, V. J. and Maronna, R. A. (1979). Asymptotic behavior of M -estimators for the linear model. Ann. Statist. 7 258–268. · Zbl 0408.62027 · doi:10.1214/aos/1176344610 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.