Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. (English) Zbl 1373.62246

Summary: We propose dimension reduction methods for sparse, high-dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower-dimensional approximations of the parameter matrix in such models. We show in this article that important gains in prediction accuracy can be obtained by considering them jointly. We motivate a new class of sparse multivariate regression models, in which the coefficient matrix has low rank and zero rows or can be well approximated by such a matrix. Next, we introduce estimators that are based on penalized least squares, with novel penalties that impose simultaneous row and rank restrictions on the coefficient matrix. We prove that these estimators indeed adapt to the unknown matrix sparsity and have fast rates of convergence. We support our theoretical results with an extensive simulation study and two data analyses.


62H12 Estimation in multivariate analysis
62J07 Ridge regression; shrinkage estimators (Lasso)
Full Text: DOI arXiv Euclid


[1] Aldrin, M. (1996). Moderate projection pursuit regression for multivariate response data. Comput. Statist. Data Anal. 21 501-531. · Zbl 0900.62334 · doi:10.1016/0167-9473(94)00029-8
[2] Anderson, T. W. (1951). Estimating linear restrictions on regression coefficients for multivariate normal distributions. Ann. Math. Statist. 22 327-351. · Zbl 0043.13902 · doi:10.1214/aoms/1177729580
[3] Bertsekas, D. (1999). Nonlinear Programming . Athena Scientific, Nashua, NH. · Zbl 1015.90077
[4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[5] Boothby, W. M. (1986). An Introduction to Differentiable Manifolds and Riemannian Geometry , 2nd ed. Pure and Applied Mathematics 120 . Academic Press, Orlando, FL. · Zbl 0596.53001
[6] Brillinger, D. R. (1981). Time Series : Data Analysis and Theory , 2nd ed. Holden-Day, Oakland, CA. · Zbl 0486.62095
[7] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data : Methods , Theory and Applications . Springer, Heidelberg. · Zbl 1273.62015
[8] Bunea, F. (2008). Honest variable selection in linear and logistic regression models via \(\ell_1\) and \(\ell_1+\ell_2\) penalization. Electron. J. Stat. 2 1153-1194. · Zbl 1320.62170 · doi:10.1214/08-EJS287
[9] Bunea, F., She, Y. and Wegkamp, M. H. (2011). Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Statist. 39 1282-1309. · Zbl 1216.62086 · doi:10.1214/11-AOS876
[10] Bunea, F., She, Y., Ombao, H., Gongvatana, A., Devlin, K. and Cohen, R. (2011). Penalized least squares regression methods and applications to neuroimaging. NeuroImage 55 1519-1527.
[11] Candès, E. J. and Plan, Y. (2010). Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. IEEE Trans. Inform. Theory 57 2342-2359. · Zbl 1366.90160
[12] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[13] Gabay, D. (1982). Minimizing a differentiable function over a differential manifold. J. Optim. Theory Appl. 37 177-219. · Zbl 0458.90060 · doi:10.1007/BF00934767
[14] Giraud, C. (2011). Low rank multivariate regression. Electron. J. Stat. 5 775-799. · Zbl 1274.62434 · doi:10.1214/11-EJS625
[15] Izenman, A. J. (2008). Modern Multivariate Statistical Techniques : Regression , Classification , and Manifold Learning . Springer, New York. · Zbl 1155.62040 · doi:10.1007/978-0-387-78189-1
[16] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302-2329. · Zbl 1231.62097 · doi:10.1214/11-AOS894
[17] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164-2204. · Zbl 1306.62156 · doi:10.1214/11-AOS896
[18] Luenberger, D. G. and Ye, Y. (2008). Linear and Nonlinear Programming , 3rd ed. International Series in Operations Research & Management Science 116 . Springer, New York. · Zbl 1207.90003
[19] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069-1097. · Zbl 1216.62090 · doi:10.1214/10-AOS850
[20] Reinsel, G. C. and Velu, R. P. (1998). Multivariate Reduced-Rank Regression : Theory and Applications. Lecture Notes in Statistics 136 . Springer, New York. · Zbl 0909.62066
[21] Rohde, A. and Tsybakov, A. B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887-930. · Zbl 1215.62056 · doi:10.1214/10-AOS860
[22] She, Y. (2012). An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors. Comput. Statist. Data Anal. 56 2976-2990. · Zbl 1255.62209
[23] Shimizu, K., Ishizuka, Y. and Bard, J. F. (1997). Nondifferentiable and Two-Level Mathematical Programming . Kluwer Academic, Boston, MA. · Zbl 0878.90088
[24] Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 475-494. · Zbl 1006.65062 · doi:10.1023/A:1017501703105
[25] Wei, F. and Huang, J. (2010). Consistent group selection in high-dimensional linear regression. Bernoulli 16 1369-1384. · Zbl 1207.62146 · doi:10.3150/10-BEJ252
[26] Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11 95-103. · Zbl 0517.62035 · doi:10.1214/aos/1176346060
[27] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[28] Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 329-346. · doi:10.1111/j.1467-9868.2007.00591.x
[29] Zangwill, W. I. and Mond, B. (1969). Nonlinear Programming : A Unified Approach . Prentice Hall International, Englewood Cliffs, NJ. · Zbl 0195.20804
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.