On constrained and regularized high-dimensional regression. (English) Zbl 1329.62307

Summary: High-dimensional feature selection has become increasingly crucial for seeking parsimonious models in estimation. For selection consistency, we derive one necessary and sufficient condition formulated on the notion of degree of separation. The minimal degree of separation is necessary for any method to be selection consistent. At a level slightly higher than the minimal degree of separation, selection consistency is achieved by a constrained \(L_0\)-method and its computational surrogate-the constrained truncated \(L_1\)-method. This permits up to exponentially many features in the sample size. In other words, these methods are optimal in feature selection against any selection method. In contrast, their regularization counterparts-the \(L_0\)-regularization and truncated \(L_1\)-regularization methods enable so under slightly stronger assumptions. More importantly, sharper parameter estimation/prediction is realized through such selection, leading to minimax parameter estimation. This, otherwise, is impossible in the absence of a good selection method for high-dimensional analysis.


62J02 General nonlinear regression
62F07 Statistical ranking and selection procedures
62F30 Parametric inference under constraints
62G08 Nonparametric regression and quantile regression


Full Text: DOI Link


[1] Akaike, H. (1973). Information theory and the maximum likelihood principle. In V. Petrov, F. Csáki (Eds.), International symposium on information theory (pp. 267-281). Budapest: Akademiai Kiádo. · Zbl 0283.62006
[2] Chen, J., Chen, Z. (2008). Extended Bayesian information criterion for model selection with large model space. Biometrika, 95, 759-771. · Zbl 1437.62415
[3] Chen, S.S., Donoho, D., Saunders, M.A. (2001). Atomic decomposition by basis pursuit. SIAM Review, 43, 129-159. · Zbl 0979.94010
[4] Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. · Zbl 1073.62547
[5] Gu, C. (1998). Model indexing and smoothing parameter in nonparametric function estimation. Statistica Sinica, 8, 607-646. · Zbl 0901.62058
[6] Ibragimov, I., Has’minskii, R. (1981). Statistical estimation. New York: Springer.
[7] Judge, G.G., Bock, M.E. (1978). The statistical implications of pretest and Stein-rule estimators in econometrics. Amsterdam: North-Holland. · Zbl 0395.62078
[8] Kim, Y., Choi, H., Oh, H.-S. (2008). Smoothly clipped absolute deviation of high dimensions. Journal of the American Statistical Association, 103, 1665-1673. · Zbl 1286.62062
[9] Liu, W., Yang, Y.Y. (2010). Consistency for BIC selection (manuscript)
[10] Lv, J., Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares. The Annals of Statistics, 37, 3498-3528. · Zbl 1369.62156
[11] Meinshausen, N., Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 1436-1462. · Zbl 1113.62082
[12] Osborne, M.R., Presnell, B., Turlach, B.A. (2000). On the Lasso and its dual. Journal of Computational and Graphical Statistics, 9, 319-337.
[13] Rainaldo, A. (2007). A note on the uniqueness of the Lasso solution. Technical Report, Department of Statistics, Carnegie Mellon University.
[14] Raskutti, G., Wainwright, M., Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over \[l_q\] balls. Technical Report, UC Berkeley. · Zbl 1365.62276
[15] Rockafellar, R.T., Wets, R.J.B. (2011). Variational analysis. vol 317. New York: Springer. · Zbl 0888.49001
[16] Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461-464. · Zbl 0379.62005
[17] Shen, X., Pan, W., Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association, 107, 223-232. · Zbl 1261.62020
[18] Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B, 58, 267-288. · Zbl 0850.62538
[19] Wainwright, M. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery of sparsity. IEEE Transactions on Information Theory, 55, 2183-2202. · Zbl 1367.62220
[20] Yang, Y., Barron, A. (1998). An asymptotic property of model selection criteria. IEEE Transactions on Information Theory, 44, 95-116. · Zbl 0949.62041
[21] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894-942. · Zbl 1183.62120
[22] Zhao, P., Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine Learning Research, 7, 2541-2563. · Zbl 1222.62008
[23] Zhou, S., Shen, X., Wolfe, D. (1998). Local asymptotics for regression splines and confidence regions. The Annals of Statistics, 26, 1760-1782. · Zbl 0929.62052
[24] Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418-1429. · Zbl 1171.62326
[25] Zou, H., Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36, 1509-1533. · Zbl 1142.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.