×

Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization. (English) Zbl 1280.62086

Summary: Least squares fitting is in general not useful for high-dimensional linear models, in which the number of predictors is of the same or even of larger order of magnitude than the number of samples. Theory developed in recent years has coined a paradigm according to which sparsity-promoting regularization is regarded as a necessity in such setting. Deviating from this paradigm, we show that non-negativity constraints on the regression coefficients may be similarly effective as explicit regularization if the design matrix has additional properties, which are met in several applications of non-negative least squares (NNLS). We show that for these designs, the performance of NNLS with regard to prediction and estimation is comparable to that of the lasso. We argue further that in specific cases, NNLS may have a better \(\ell_\infty\)-rate in estimation and hence also advantages with respect to support recovery when combined with thresholding. From a practical point of view, NNLS does not depend on a regularization parameter and is hence easier to use.

MSC:

62J05 Linear regression; mixed models
62H12 Estimation in multivariate analysis
62F12 Asymptotic properties of parametric estimators

Software:

NNLS
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Szlam, A., Guo, Z., and Osher, S. A split Bregman method for non-negative sparsity penalized least squares with applications to hyperspectral demixing. In IEEE International Conference on Image Processing , 2010.
[2] Bardsley, J. and Nagy, J. Covariance-preconditioned iterative methods for nonnegatively constrained astronomical imaging. SIAM Journal on Matrix Analysis and Applications , 27:1184-1198, 2006. · Zbl 1102.65044 · doi:10.1137/040615043
[3] Bartlett, P., Mendelson, S., and Neeman, J. \(\ell_{1}\)-regularized linear regression: Persistence and oracle inequalities. Probability Theory and Related Fields , 254:193-224, 2012. · Zbl 1395.62207
[4] Belloni, A., Chernozukov, V., and Wang, L. Square root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika , 98:791-806, 2011. · Zbl 1228.62083 · doi:10.1093/biomet/asr043
[5] Bickel, P., Ritov, Y., and Tsybakov, A. Simultaneous analysis of lasso and Dantzig selector. The Annals of Statistics , 37:1705-1732, 2009. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[6] Boyd, S. and Vandenberghe, L. Convex Optimization . Cambridge University Press, 2004. · Zbl 1058.90049
[7] Bruckstein, A., Elad, M., and Zibulevsky, M. On the uniqueness of nonnegative sparse solutions to underdetermined systems of equations. IEEE Transactions on Information Theory , 54:4813-4820, 2008. · Zbl 1319.15007
[8] Bühlmann, P. and van de Geer, S. Statistics for High-Dimensional Data . Springer, 2011. · Zbl 1273.62015
[9] Bunea, F. Honest variable selection in linear and logistic regression models via \(\ell_{1}\) and \(\ell_{1}+\ell_{2}\) penalization. The Electronic Journal of Statistics , 2:1153-1194, 2008. · Zbl 1320.62170
[10] Bunea, F., Tsybakov, A., and Wegkamp, M. Sparsity oracle inequalities for the lasso. The Electronic Journal of Statistics , 1:169-194, 2007. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[11] Bunea, F., Tsybakov, A., Wegkamp, M., and Barbu, A. SPADES and mixture models. The Annals of Statistics , 38:2525-2558, 2010. · Zbl 1198.62025 · doi:10.1214/09-AOS790
[12] Candes, E. and Plan, Y. Near-ideal model selection by \(\ell_{1}\)-minimization. The Annals of Statistics , 37:2145-2177, 2009. · Zbl 1173.62053 · doi:10.1214/08-AOS653
[13] Candes, E. and Tao, T. The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). The Annals of Statistics , 35:2313-2351, 2007. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[14] Chen, D. and Plemmons, R. Nonnegativity constraints in numerical analysis. In Symposium on the Birth of Numerical Analysis , 2007. · Zbl 1192.65041
[15] Donoho, D. Compressed sensing. IEEE Transactions on Information Theory , 52:1289-1306, 2006 · Zbl 1288.94016
[16] Donoho, D., Johnstone, I., Hoch, J., and Stern, A. Maximum entropy and the nearly black object. Journal of the Royal Statistical Society Series B , 54:41-81, 1992. · Zbl 0788.62103
[17] Donoho, D. and Tanner, J. Neighbourliness of randomly projected simplices in high dimensions. Proceedings of the National Academy of Science , 102:9452-9457, 2005a. · Zbl 1135.60300 · doi:10.1073/pnas.0502258102
[18] Donoho, D. and Tanner, J. Sparse nonnegative solution of underdetermined linear equations by linear programming. Proceedings of the National Academy of Science , 102:9446-9451, 2005b. · Zbl 1135.90368 · doi:10.1073/pnas.0502269102
[19] Donoho, D. and Tanner, J. Counting the faces of randomly-projected hypercubes and orthants, with applications. Discrete and Computational Geometry , 43:522-541, 2010. · Zbl 1191.52004 · doi:10.1007/s00454-009-9221-z
[20] Dorfman, R. The detection of defective members of large populations. The Annals of Mathematical Statistics , 14:436-440, 1943.
[21] Du, D. and Hwang, F. Pooling Designs and Nonadaptive Group Testing . Twayne Publishers, 2006. · Zbl 1284.62009
[22] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. Least angle regression. The Annals of Statistics , 32:407-499, 2004. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[23] Genovese, C., Jin, J., Wasserman, L., and Yao, Z. A comparison of the lasso and marginal regression. Journal of Machine Learning Research , 13:2107-2143, 2012. · Zbl 1435.62270
[24] Greenshtein, E. and Ritov, Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli , 6:971-988, 2004. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[25] Hebiri, M. and Lederer, J. How correlations influence lasso prediction. IEEE Transactions on Information Theory , 59:1846-1854, 2013. · Zbl 1364.62186
[26] Huet, S., Giraud, C., and Verzelen, N. High-dimensional regression with unknown variance. Statistical Science , 27:500-518, 2012. · Zbl 1296.92110
[27] Kim, D., Sra, S., and Dhillon, I. Tackling box-constrained convex optimization via a new projected quasi-Newton approach. SIAM Journal on Scientific Computing , 32:3548-3563, 2010. · Zbl 1220.93085 · doi:10.1137/08073812X
[28] Latala, R., Mankiewicz, P., Oleskiewicz, K., and Tomczak-Jaegermann, N. Banach-Mazur distances and projections on random subgaussian polytopes. Discrete and Computational Geometry , 38:29-50, 2007. · Zbl 1134.52016 · doi:10.1007/s00454-007-1326-7
[29] Lawson, R. and Hanson, C. Solving Least Squares Problems . SIAM Classics in Applied Mathematics, 1987. · Zbl 0860.65028
[30] Li, L. and Speed, T. Parametric deconvolution of positive spike trains. The Annals of Statistics , 28:1279-1301, 2000. · Zbl 1105.62382 · doi:10.1214/aos/1015957394
[31] Lin, Y., Lee, D., and Saul, L. Nonnegative deconvolution for time of arrival estimation. In ICASSP , 2004.
[32] Litvak, A., Pajor, A., Rudelson, M., and Tomczak-Jaegermann, N. Smallest singular value of random matrices and geometry of random polytopes. Advances in Mathematics , 195:491-523, 2005. · Zbl 1077.15021 · doi:10.1016/j.aim.2004.08.004
[33] Lounici, K. Sup-norm convergence rate and sign concentration property of lasso and Dantzig selectors. The Electronic Journal of Statistics , 2:90-102, 2008. · Zbl 1306.62155
[34] Meinshausen, N. Sign-constrained least squares estimation for high-dimensional regression. The Electronic Journal of Statistics , 7:1607-1631, 2013. · Zbl 1327.62422
[35] Meinshausen, N. and Bühlmann, P. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics , 34:1436-1462, 2006. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[36] Meinshausen, N. and Yu, B. Lasso-type recovery of sparse representations for high-dimensional data. The Annals of Statistics , 37:246-270, 2009. · Zbl 1155.62050 · doi:10.1214/07-AOS582
[37] Raskutti, G., Wainwright, M., and Yu, B. Restricted nullspace and eigenvalue properties for correlated Gaussian designs. Journal of Machine Learning Research , 11:2241-2259, 2010. · Zbl 1242.62071
[38] Rudelson, M. and Vershynin, R. Non-asymptotic theory of random matrices: Extreme singular values. In International Congress of Mathematics , 2010. · Zbl 1227.60011
[39] Rudelson, M. and Zhou, S. Reconstruction from anisotropic random measurements. IEEE Transactions on Information Theory , 59:3434-3447, 2013. · Zbl 1364.94158
[40] Schneider, R. and Weil, W. Stochastic and Integral Geometry . Springer, 2008. · Zbl 1175.60003
[41] Schölkopf, B. and Smola, A. Learning with Kernels . MIT Press, Cambridge, Massachussets, 2002. · Zbl 1019.68094
[42] Slawski, M. and Hein, M. Sparse recovery for protein mass spectrometry data. In NIPS workshop on practical applications of sparse modelling , 2010.
[43] Slawski, M. and Hein, M. Sparse recovery by thresholded non-negative least squares. In Advances in Neural Information Processing Systems 24 , pages 1926-1934, 2011.
[44] Slawski, M. and Hein, M. Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization. · Zbl 1280.62086
[45] Slawski, M. and Hein, M. Supplementary material to ‘Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization’. , 2013. · Zbl 1280.62086
[46] Sun, T. and Zhang, C.-H. Scaled sparse linear regression. Biometrika , 99:879-898, 2012. · Zbl 1452.62515
[47] Tibshirani, R. Regression shrinkage and variable selection via the lasso. Journal of the Royal Statistical Society Series B , 58:671-686, 1996. · Zbl 0850.62538
[48] Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective (with discussion). Journal of the Royal Statistical Society Series B , 73:273-282, 2011.
[49] Tropp, J. Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory , 50:2231-2242, 2004. · Zbl 1288.94019
[50] Tsybakov, A. Discussion of ‘Stability Section’ by N. Meinshausen and P. Bühlmann. Journal of the Royal Statistical Society Series B , 72:467, 2010.
[51] van de Geer, S. High-dimensional generalized linear models and the lasso. The Annals of Statistics , 36:614-645, 2008. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[52] van de Geer, S. and Bühlmann, P. On the conditions used to prove oracle results for the lasso. The Electronic Journal of Statistics , 3:1360-1392, 2009. · Zbl 1327.62425
[53] van de Geer, S. and Lederer, J. From Probability to Statistics and Back: High-Dimensional Models and Processes. A Festschrift in Honor of Jon Wellner , chapter ‘The lasso, correlated design, and improved oracle inequalities’. IMS Collections. 2012.
[54] Wainwright, M. Sharp thresholds for noisy and high-dimensional recovery of sparsity using \(\ell_{1}\)-constrained quadratic programming (lasso). IEEE Transactions on Information Theory , 55:2183-2202, 2009. · Zbl 1367.62220
[55] Wang, M. and Tang, A. Conditions for a unique non-negative solution to an underdetermined system. In Proceedings of Allerton Conference on Communication, Control, and Computing , 2009.
[56] Wang, M., Xu, W., and Tang, A. A unique nonnegative solution to an undetermined system: From vectors to matrices. IEEE Transactions on Signal Processing , 59:1007-1016, 2011. · Zbl 1391.15004
[57] Wendel, J. G. A problem in geometric probability. Mathematics Scandinavia , 11:109-111, 1962. · Zbl 0108.31603
[58] Zhang, C.-H. and Huang, J. The sparsity and bias of the lasso selection in high-dimensional linear regression. The Annals of Statistics , 36:1567-1594, 2008. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[59] Zhang, T. On the consistency of feature selection using greedy least squares regression. Journal of Machine Learning Research , 10:555-568, 2009a. · Zbl 1235.62096
[60] Zhang, T. Some sharp performance bounds for least squares regression with \(L_{1}\) regularization. The Annals of Statistics , 37:2109-2144, 2009b. · Zbl 1173.62029 · doi:10.1214/08-AOS659
[61] Zhao, P. and Yu, B. On model selection consistency of the lasso. Journal of Machine Learning Research , 7:2541-2567, 2006. · Zbl 1222.62008
[62] Zhao, Y.-B. Sparsest and least \(\ell_{1}\)-norm nonnegative solutions to linear systems with applications to compressive sensing and linear programs. Technical report, University of Birmingham, 2012.
[63] Zhou, S. Thresholding procedures for high dimensional variable selection and statistical estimation. In NIPS , 2009.
[64] Zou, H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association , 101:1418-1429, 2006. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.