zbMATH — the first resource for mathematics

Support union recovery in high-dimensional multivariate regression. (English) Zbl 1373.62372
Summary: In multivariate regression, a \(K\)-dimensional response vector is regressed upon a common set of \(p\) covariates, with a matrix \(B^\ast\in\mathbb R^{p\times K}\) of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the \(\ell_1/\ell_2\) norm is used for support union recovery, or recovery of the set of \(s\) rows for which \(B^\ast\) is nonzero. Under high-dimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter \(\theta(n,p,s):= n/[2\psi(B^\ast)\log(p-s)]\). Here \(n\) is the sample size, and \(\psi(B^\ast)\) is a sparsity-overlap function measuring a combination of the sparsities and overlaps of the \(K\)-regression coefficient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences \((n,p,s)\) such that \(\theta(n,p,s)\) exceeds a critical level \(\theta_u\), and fails for sequences such that \(\theta(n,p,s)\) lies below a critical level \(\theta_\ell\). For the special case of the standard Gaussian ensemble, we show that \(\theta_\ell=\theta_u\) so that the characterization is sharp. The sparsity-overlap function \(\psi(B^\ast)\) reveals that, if the design is uncorrelated on the active rows, \(\ell_1/\ell_2\) regularization for multivariate regression never harms performance relative to an ordinary Lasso approach and can yield substantial improvements in sample complexity (up to a factor of \(K\)) when the coefficient vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.

62J07 Ridge regression; shrinkage estimators (Lasso)
62F07 Statistical ranking and selection procedures
Full Text: DOI arXiv
[1] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis . Wiley, New York. · Zbl 0651.62041
[2] Argyriou, A., Evgeniou, T. and Pontil, M. (2006). Multi-task feature learning. In Advances in Neural Information Processing Systems 19 41-48. MIT Press, Cambridge, MA.
[3] Bach, F. (2008). Consistency of the group Lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179-1225. · Zbl 1225.68147 · www.jmlr.org
[4] Bach, F., Lanckriet, G. and Jordan, M. I. (2004). Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the 21st International Conference in Machine Learning 41-48. ACM, New York.
[5] Bertsekas, D. P. (1995). Nonlinear Programming . Athena Scientific, Belmont, MA. · Zbl 0935.90037
[6] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[7] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization . Cambridge Univ. Press, Cambridge, UK. · Zbl 1058.90049
[8] Chen, S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002 · doi:10.1137/S1064827596304010
[9] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices, and Banach spaces. In Handbook of Banach Spaces 1 317-336. Elsevier, Amsterdam. · Zbl 1067.46008 · doi:10.1016/S1874-5849(01)80010-3
[10] Donoho, D. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845-2862. · Zbl 1019.94503 · doi:10.1109/18.959265
[11] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[12] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1361. JSTOR: · Zbl 1073.62547 · doi:10.1198/016214501753382273 · links.jstor.org
[13] Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109-135. · Zbl 0775.62288 · doi:10.2307/1269656
[14] Huang, J., Horowitz, J. L. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587-613. · Zbl 1133.62048 · doi:10.1214/009053607000000875
[15] Huang, J. and Zhang, T. (2009). The benefit of group sparsity. Technical report, Rutgers University. Available at . · Zbl 1202.62052 · doi:10.1214/09-AOS778 · arxiv.org
[16] Knight, K. and Fu, W. J. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[17] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1303-1338. · Zbl 1105.62328 · doi:10.1214/aos/1015957395
[18] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes . Springer, New York. · Zbl 0748.60004
[19] Liu, H. and Zhang, J. (2008). On the \ell 1 - \ell q regularized regression. Technical report, Carnegie Mellon University. Available at . · arxiv.org
[20] Lounici, K., Tsybakov, A. B., Pontil, M. and van de Geer, S. A. (2009). Taking advantage of sparsity in multi-task learning. In Proceedings of the 22nd Conference on Learning Theory . Montreal.
[21] Massart, P. (2003). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896 . Springer, New York. · Zbl 1170.60006 · doi:10.1007/978-3-540-48503-2
[22] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. Roy. Statist. Soc. Ser. B 70 53-71. · Zbl 1400.62276
[23] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[24] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246-270. · Zbl 1155.62050 · doi:10.1214/07-AOS582 · www.projecteuclid.org
[25] Negahban, S. and Wainwright, M. (2008). Joint support recovery under high-dimensional scaling: Benefits and perils of \ell 1 \? \ell \infty regularization. In Advances in Neural Information Processing Systems 21 1161-1168. MIT Press, Cambridge, MA.
[26] Obozinski, G., Taskar, B. and Jordan, M. I. (2010). Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20 231-252.
[27] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389-403. · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389
[28] Ravikumar, P., Liu, H., Lafferty, J. and Wasserman, L. (2009). SpAM: Sparse additive models. J. Roy. Statist. Soc. Ser. B 71 1009-1030.
[29] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538 · links.jstor.org
[30] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030-1051. · Zbl 1288.94025 · doi:10.1109/TIT.2005.864420
[31] Turlach, B., Venables, W. and Wright, S. (2005). Simultaneous variable selection. Technometrics 27 349-363. · doi:10.1198/004017005000000139
[32] Wainwright, M. J. (2009a). Information-theoretic bounds on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55 5728-5741. · Zbl 1367.94106
[33] Wainwright, M. J. (2009b). Sharp thresholds for high-dimensional and noisy sparsity recovery using \ell 1 -constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183- 2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[34] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49-67. JSTOR: · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x · links.jstor.org
[35] Zhang, H., Liu, H., Wu, Y. and Zhu, J. (2008). Variable selection for the multi-category SVM via adaptive sup-norm regularization. Electron. J. Statist. 2 1149-1167. · Zbl 1135.62056 · doi:10.1214/08-EJS122
[36] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468-3497. · Zbl 1369.62164 · doi:10.1214/07-AOS584
[37] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2567. · Zbl 1222.62008 · www.jmlr.org
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.