Pivotal estimation via square-root lasso in nonparametric regression. (English) Zbl 1321.62030

Summary: We propose a self-tuning \(\sqrt{\mathrm {Lasso}} \) method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for \(\sqrt{\mathrm {Lasso}} \) including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by \(\sqrt{\mathrm {Lasso}} \) accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post \(\sqrt{\mathrm {Lasso}} \) is as good as \(\sqrt{\mathrm {Lasso}} \)’s rate. As an application, we consider the use of \(\sqrt{\mathrm {Lasso}} \) and ols post \(\sqrt{\mathrm {Lasso}} \) as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or \(Z\)-problem), resulting in a construction of \(\sqrt{n}\)-consistent and asymptotically normal estimators of the main parameters.


62G05 Nonparametric estimation
62G08 Nonparametric regression and quantile regression
62H12 Estimation in multivariate analysis
62G20 Asymptotic properties of nonparametric inference
62G35 Nonparametric robustness


Full Text: DOI arXiv Euclid


[1] Amemiya, T. (1977). The maximum likelihood and the nonlinear three-stage least squares estimator in the general nonlinear simultaneous equation model. Econometrica 45 955-968. · Zbl 0359.62026
[2] Belloni, A., Chernozhukov, V. and Wang, L. (2014). Supplement to “Pivotal estimation via square-root Lasso in nonparametric regression.” . · Zbl 1321.62030
[3] Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 2369-2429. · Zbl 1274.62464
[4] Belloni, A. and Chernozhukov, V. (2011). High dimensional sparse econometric models: An introduction. In Inverse Problems and High-Dimensional Estimation. Lect. Notes Stat. Proc. 203 121-156. Springer, Heidelberg. · Zbl 1209.62064
[5] Belloni, A. and Chernozhukov, V. (2011). \(\ell_1\)-penalized quantile regression in high-dimensional sparse models. Ann. Statist. 39 82-130. · Zbl 1209.62064
[6] Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521-547. · Zbl 1456.62066
[7] Belloni, A., Chernozhukov, V., Fernandez-Val, I. and Hansen, C. (2013). Program evaluation with high-dimensional data. Available at .
[8] Belloni, A., Chernozhukov, V. and Hansen, C. (2010). Lasso methods for Gaussian instrumental variables models. Available at .
[9] Belloni, A., Chernozhukov, V. and Hansen, C. (2011). Inference for high-dimensional sparse econometric models. In Advances in Economics and Econometrics . 10 th World Congress of Econometric Society. August 2010 III 245-295. Cambridge Univ. Press, New York.
[10] Belloni, A., Chernozhukov, V. and Hansen, C. (2013). Inference on treatment effects after selection amongst high-dimensional controls. Rev. Econom. Stud. . · Zbl 1456.62066
[11] Belloni, A., Chernozhukov, V. and Kato, K. (2012). Uniform post selection inference for LAD regression and other \(Z\)-estimation problems. Available at . · Zbl 1345.62049
[12] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root Lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791-806. · Zbl 1228.62083
[13] Belloni, A., Chernozhukov, V. and Wei, Y. (2013). Honest confidence regions for a regression parameter in logistic regression with a large number of controls. Available at . · Zbl 1456.62066
[14] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022
[15] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data : Methods , Theory and Applications . Springer, Heidelberg. · Zbl 1273.62015
[16] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169-194. · Zbl 1146.62028
[17] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065
[18] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). Ann. Statist. 35 2313-2351. · Zbl 1139.62019
[19] Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by \(\ell_1\) minimization. Ann. Statist. 37 2145-2177. · Zbl 1173.62053
[20] Chamberlain, G. (1992). Efficiency bounds for semiparametric regression. Econometrica 60 567-596. · Zbl 0774.62038
[21] Chen, Y. and Dalalyan, A. S. (2012). Fused sparsity and robust estimation for linear models with unknown variance. Adv. Neural Inf. Process. Syst. 25 1268-1276.
[22] Chernozhukov, V., Chetverikov, D. and Kato, K. (2012). Gaussian approximations of suprema of empirical processes. Available at . · Zbl 1317.60038
[23] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786-2819. · Zbl 1292.62030
[24] Chrétien, S. and Darses, S. (2012). Sparse recovery with unknown variance: A Lasso-type approach. Available at . · Zbl 1246.15034
[25] de la Peña, V. H., Lai, T. L. and Shao, Q.-M. (2009). Self-Normalized Processes . Springer, Berlin. · Zbl 1165.62071
[26] Dümbgen, L., van de Geer, S. A., Veraar, M. C. and Wellner, J. A. (2010). Nemirovski’s inequalities revisited. Amer. Math. Monthly 117 138-160. · Zbl 1213.60039
[27] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849-911.
[28] Farrell, M. (2013). Robust inference on average treatment effects with possibly more covariates than observations. Available at . · Zbl 1337.62113
[29] Gautier, E. and Tsybakov, A. (2011). High-dimensional instrumental variables rergession and confidence sets. Available at .
[30] Giraud, C., Huet, S. and Verzelen, N. (2012). High-dimensional regression with unknown variance. Statist. Sci. 27 500-518. · Zbl 1296.92110
[31] Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50 1029-1054. · Zbl 0502.62098
[32] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability ( Berkeley , CA , 1965 / 66), Vol. I : Statistics 221-233. Univ. California Press, Berkeley, CA. · Zbl 0212.21504
[33] Jing, B.-Y., Shao, Q.-M. and Wang, Q. (2003). Self-normalized Cramér-type large deviations for independent random variables. Ann. Probab. 31 2167-2215. · Zbl 1051.60031
[34] Klopp, O. (2011). High dimensional matrix estimation with unknown variance of the noise. Available at . · Zbl 1274.62489
[35] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7-57. · Zbl 1168.62044
[36] Kosorok, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference . Springer, New York. · Zbl 1180.62137
[37] Leeb, H. and Pötscher, B. M. (2008). Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory 24 338-376. · Zbl 1284.62152
[38] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90-102. · Zbl 1306.62155
[39] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2010). Taking advantage of sparsity in multi-task learning. Available at . · Zbl 1306.62156
[40] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246-270. · Zbl 1155.62050
[41] Robinson, P. M. (1988). Root-\(N\)-consistent semiparametric regression. Econometrica 56 931-954. · Zbl 0647.62100
[42] Rosenbaum, M. and Tsybakov, A. B. (2010). Sparse recovery under matrix uncertainty. Ann. Statist. 38 2620-2651. · Zbl 1373.62357
[43] Städler, N., Bühlmann, P. and van de Geer, S. (2010). \(\ell_1\)-penalization for mixture regression models. TEST 19 209-256. · Zbl 1203.62128
[44] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879-898. · Zbl 1452.62515
[45] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation . Springer, New York. · Zbl 1176.62032
[46] van de Geer, S. A. (2007). The deterministic Lasso. In JSM proceedings .
[47] van de Geer, S. A. (2008). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323
[48] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425
[49] van de Geer, S. A., Bühlmann, P. and Ritov, Y. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. Available at . · Zbl 1305.62259
[50] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3 . Cambridge Univ. Press, Cambridge. · Zbl 0910.62001
[51] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes : With Applications to Statistics . Springer, New York. · Zbl 0862.60002
[52] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220
[53] Wang, L. (2013). The \(L_1\) penalized LAD estimator for high dimensional linear regression. J. Multivariate Anal. 120 135-151. · Zbl 1279.62144
[54] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044
[55] Zhang, C.-H. and Zhang, S. S. (2011). Confidence intervals for low-dimensional parameters with high-dimensional data. Available at .
[56] Zhao, R., Sun, T., Zhang, C.-H. and Zhou, H. H. (2013). Asymptotic normality and optimalities in estimation of large Gaussian graphical model. Available at . · Zbl 1328.62342
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.