×

\(\ell_1\)-penalized quantile regression in high-dimensional sparse models. (English) Zbl 1209.62064

Summary: We consider median regression and, more generally, a possibly infinite collection of quantile regressions in high-dimensional sparse models. In these models, the number of regressors \(p\) is very large, possibly larger than the sample size \(n\), but only at most \(s\) regressors have a non-zero impact on each conditional quantile of the response variable, where \(s\) grows more slowly than \(n\). Since ordinary quantile regression is not consistent in this case, we consider \(\ell _{1}\)-penalized quantile regression (\(\ell _{1}\)-QR), which penalizes the \(\ell _{1}\)-norm of regression coefficients, as well as the post-penalized QR estimator (post-\(\ell _{1}\)-QR), which applies ordinary QR to the model selected by \(\ell _{1}\)-QR. First, we show that under general conditions \(\ell _{1}\)-QR is consistent at the near-oracle rate \(\sqrt{s/n}\sqrt{\log(p \vee n)}\), uniformly in the compact set \(\mathcal U \subset (0,1)\) of quantile indices. In deriving this result, we propose a partly pivotal, data-driven choice of the penalty level and show that it satisfies the requirements for achieving this rate. Second, we show that under similar conditions post-\(\ell _{1}\)-QR is consistent at the near-oracle rate \(\sqrt{s/n}\sqrt{\log(p \vee n)}\), uniformly over \(\mathcal U\), even if the \(\ell _{1}\)-QR-selected models miss some components of the true models, and the rate could be even closer to the oracle rate otherwise. Third, we characterize conditions under which \(\ell _{1}\)-QR contains the true model as a submodel, and derive bounds on the dimension of the selected model, uniformly over \(\mathcal U\); we also provide conditions under which hard-thresholding selects the minimal true model, uniformly over \(\mathcal U\).

MSC:

62G08 Nonparametric regression and quantile regression
62H12 Estimation in multivariate analysis
62J99 Linear inference, regression
62J07 Ridge regression; shrinkage estimators (Lasso)
62M99 Inference from stochastic processes

References:

[1] Belloni, A. and Chernozhukov, V. (2009). Computational complexity of MCMC-based estimators in large samples. Ann. Statist. 37 2011-2055. · Zbl 1175.65015 · doi:10.1214/08-AOS634
[2] Belloni, A. and Chernozhukov, V. (2010). Supplement to “ \ell 1 -penalized quantile regression in high-dimensional sparse models.” DOI: . · Zbl 1209.62064
[3] Belloni, A. and Chernozhukov, V. (2009). \ell 1 -penalized quantile regression in high-dimensional sparse models. Available at . · Zbl 1175.65015 · doi:10.1214/08-AOS634
[4] Belloni, A. and Chernozhukov, V. (2008). Conditional quantile processes under increasing dimension. Technical report, Duke and MIT.
[5] Belloni, A. and Chernozhukov, V. (2009). Post- \ell 1 -penalized estimators in high-dimensional linear regression models. Available at . · Zbl 1175.65015 · doi:10.1214/08-AOS634
[6] Bertsimas, D. and Tsitsiklis, J. (1997). Introduction to Linear Optimization . Athena Scientific, Belmont, MA.
[7] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[8] Buchinsky, M. (1994). Changes in the U.S. wage structure 1963-1987: Application of quantile regression. Econometrica 62 405-458. · Zbl 0800.90235 · doi:10.2307/2951618
[9] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via \ell 1 penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory (COLT 2006) (G. Lugosi and H. U. Simon, eds.). Lecture Notes in Artificial Intelligence 4005 379-391. Springer, Berlin. · Zbl 1143.62319 · doi:10.1007/11776420_29
[10] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065 · doi:10.1214/009053606000001587
[11] Bunea, F., Tsybakov, A. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[12] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[13] Chernozhukov, V. (2005). Extremal quantile regression. Ann. Statist. 33 806-839. · Zbl 1068.62063 · doi:10.1214/009053604000001165
[14] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849-911.
[15] Gutenbrunner, C. and Jurečková, J. (1992). Regression rank scores and regression quantiles. Ann. Statist. 20 305-330. · Zbl 0759.62015 · doi:10.1214/aos/1176348524
[16] He, X. and Shao, Q.-M. (2000). On parameters of increasing dimenions. J. Multivariate Anal. 73 120-135. · Zbl 0948.62013 · doi:10.1006/jmva.1999.1873
[17] Knight, K. (1998). Limiting distributions for L 1 regression estimators under general conditions. Ann. Statist. 26 755-770. · Zbl 0929.62021 · doi:10.1214/aos/1028144858
[18] Knight, K. and Fu, W. J. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[19] Koenker, R. (2005). Quantile Regression . Cambridge Univ. Press, Cambridge. · Zbl 1111.62037
[20] Koenker, R. (2010). Additive models for quantile regression: Model selection and confidence bandaids. Working paper. Available at .
[21] Koenker, R. and Basset, G. (1978). Regression quantiles. Econometrica 46 33-50. JSTOR: · Zbl 0373.62038 · doi:10.2307/1913643
[22] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. 45 7-57. · Zbl 1168.62044 · doi:10.1214/07-AIHP146
[23] Laplace, P.-S. (1818). Théorie Analytique des Probabilités . Éditions Jacques Gabay (1995), Paris.
[24] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und ihrer Grenzgebiete 23 . Springer, Berlin. · Zbl 0748.60004
[25] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. A. (2009). Taking advantage of sparsity in multi-task learning. In COLT’09 . Omnipress, Madison, WI.
[26] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246-270. · Zbl 1155.62050 · doi:10.1214/07-AOS582
[27] Portnoy, S. (1991). Asymptotic behavior of regression quantiles in nonstationary, dependent cases. J. Multivariate Anal. 38 100-113. · Zbl 0737.62078 · doi:10.1016/0047-259X(91)90034-Y
[28] Portnoy, S. and Koenker, R. (1997). The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators. Statist. Sci. 12 279-300. · Zbl 0955.62608 · doi:10.1214/ss/1030037960
[29] Rosenbaum, M. and Tsybakov, A. B. (2010). Sparse recovery under matrix uncertainty. Ann. Statist. 38 2620-2651. · Zbl 1373.62357 · doi:10.1214/10-AOS793
[30] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
[31] van der Vaart, A. W. (1998). Asymptotic Statistics . Cambridge Univ. Press, Cambridge, MA. · Zbl 0910.62001 · doi:10.1017/CBO9780511802256
[32] van de Geer, S. A. (2008). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[33] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes . Springer, New York. · Zbl 0862.60002
[34] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.