Estimation of high-dimensional low-rank matrices. (English) Zbl 1215.62056

Summary: Suppose that we observe entries or, more generally, linear combinations of entries of an unknown \(m\times T\)-matrix \(A\) corrupted by noise. We are particularly interested in the high-dimensional setting where the number \(mT\) of unknown entries can be much larger than the sample size \(N\). Motivated by several applications, we consider estimation of a matrix \(A\) under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-\(p\) quasi-norm penalty term, \(p\leq 1\). We study these estimators under two possible assumptions-a modified version of the restricted isometry condition and a uniform bound on the ratio “empirical norm induced by the sampling operator/Frobenius norm.” The main results are stated as non-asymptotic upper bounds on the prediction risk and on the Schatten-\(q\) risk of the estimators, where \(q\in [p, 2]\). The rates that we obtain for the prediction risk are of the form \(rm/N\) (for \(m=T)\), up to logarithmic factors, where \(r\) is the rank of \(A\). The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the \(k\) th entropy numbers of the quasi-convex Schatten class embeddings \(S_{p}^{M}\hookrightarrow S_{2}^{M},\, p<1\), which are of independent interest.


62H12 Estimation in multivariate analysis
62G05 Nonparametric estimation
62F10 Point estimation
Full Text: DOI arXiv


[1] Abernethy, J., Bach, F., Evgeniou, T. and Vert, J.-P. (2009). A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res. 10 803-826. · Zbl 1235.68122
[2] Amini, A. and Wainwright, M. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877-2921. · Zbl 1173.62049
[3] Argyriou, A., Evgeniou, T. and Pontil, M. (2008). Convex multi-task feature learning. Mach. Learn. 73 243-272.
[4] Argyriou, A., Micchelli, C. A. and Pontil, M. (2010). On spectral learning. J. Mach. Learn. Res. · Zbl 1242.68201
[5] Argyriou, A., Micchelli, C. A., Pontil, M. and Ying, Y. (2008). A spectral regularization framework for multi-task structure learning. In Advances in Neural Information Processing Systems 20 (J.C. Platt, et al., eds.) 25-32. MIT Press, Cambridge, MA.
[6] Bach, F. R. (2008). Consistency of trace norm minimization. J. Mach. Learn. Res. 9 1019-1048. · Zbl 1225.68146
[7] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199-227. · Zbl 1132.62040
[8] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022
[9] Bunea, F., She, Y. and Wegkamp, M. H. (2010). Optimal selection of reduced rank estimators of high-dimensional matrices. Available at . · Zbl 1216.62086
[10] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065
[11] Cai, T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118-2144. · Zbl 1202.62073
[12] Candès, E. J. and Plan, Y. (2010a). Matrix completion with noise. Proc. IEEE 98 925-936.
[13] Candès, E. J. and Plan, Y. (2010b). Tight oracle bounds for low-rank matrix recovery from a mininal number of noisy random measurements. Available at . · Zbl 1366.90160
[14] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717-772. · Zbl 1219.90124
[15] Candès, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203-4215. · Zbl 1264.94121
[16] Candès, E. J. and Tao, T. (2009). The power of convex relaxation: Near-optimal matrix completion. Unpublished manuscript. · Zbl 1366.15021
[17] McCarthy, C. A. (1967). C p . Israel J. Math. 5 249-272. · Zbl 0156.37902
[18] Dalalyan, A. and Tsybakov, A. (2008). Aggregation by exponential weighting, sharp oracle inequalities and sparsity. Mach. Learn. 72 39-61.
[19] Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object. J. Roy. Statist. Soc. Ser. B 54 41-81. JSTOR: · Zbl 0788.62103
[20] Edmunds, D. E. and Triebel, H. (1996). Function Spaces, Entropy Numbers, Differential Operators . Cambridge Univ. Press, Cambridge. · Zbl 0865.46020
[21] Edmunds, D. E. and Triebel, H. (1989). Entropy numbers and approximation numbers in function spaces. Proc. London Math. Soc. 58 137-152. · Zbl 0629.46034
[22] Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947-1975. · Zbl 0829.62066
[23] Guédon, O. and Litvak, A. E. (2000). Euclidean projections of a p -convex body. In Geometric Aspects of Functional Analysis, Israel Seminar (GAFA) 1996-2000 (V. D. Milman and G. Schechtman, eds.). Lecutre Notes in Mathematics 1745 95-108. Springer, Berlin. · Zbl 0988.52010
[24] Gross, D. (2009). Recovering low-rank matrices from few coefficients in any basis. Available at . · Zbl 1366.94103
[25] Keshavan, R. H., Montanari, A. and Oh, S. (2009). Matrix completion from noisy entries. Available at . · Zbl 1242.62069
[26] Kolmogorov, A. N. and Tikhomirov, V. M. (1959). The \epsilon -entropy and \epsilon -capacity of sets in function spaces. Uspekhi Matem. Nauk 14 3-86. · Zbl 0090.33503
[27] Koltchinskii, V. (2008). Oracle inequalities in empirical risk minimization and sparse recovery problems. Ecole d’Eté de Probabilités de Saint-Flour, Lecture Notes .
[28] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2009). Taking advantage of sparsity in multi-task learning. In Proceedings of COLT-2009 .
[29] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2010). Oracle inequalities and optimal inference under group sparsity. Available at . · Zbl 1306.62156
[30] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[31] Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17 1248-1282. · Zbl 1163.46008
[32] Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2009). A unified framework for high-dimensional analysis of M -estimators with decomposable regularizers. In Advances in Neural Information Processing Systems, NIPS-2009 . · Zbl 1331.62350
[33] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low rank matrices with noise and high-dimensional scaling. Ann. Statist. To appear. Available at . · Zbl 1216.62090
[34] Nemirovski, A. (2004). Regular Banach spaces and large deviations of random sums. Unpublished manuscript. · Zbl 1106.90059
[35] Pajor, A. (1998). Metric entropy of the Grassmann manifold. Convex Geom. Anal. 34 181-188. · Zbl 0942.46013
[36] Paulsen, V. I. (1986). Completely Bounded Maps and Dilations . In Pitman Research Notes in Mathematics 146 . Longman, New York. · Zbl 0614.47006
[37] Pietsch, A. (1980). Operator Ideals . Elsevier, Amsterdam. · Zbl 0434.47030
[38] Pinelis, I. F. and Sakhanenko, A. I. (1985). Remarks on inequalities for the probabilities of large deviations. Theory Probab. Appl. 30 143-148. · Zbl 0583.60023
[39] Ravikumar, P., Wainwright, M., Raskutti, G. and Yu, B. (2008). High-dimensional covariance estimation by minimizing \ell 1 -penalized log-determinant divergence. Unpublished manuscript. · Zbl 1274.62190
[40] Recht, B. (2009). A simpler approach to matrix completion. Available at . · Zbl 1280.68141
[41] Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471-501. · Zbl 1198.90321
[42] Rigollet, P. and Tsybakov, A. B. (2010). Exponential screening and optimal rates of sparse estimation. Available at . · Zbl 1215.62043
[43] Rotfeld, S. Y. (1969). The singular numbers of the sum of completely continuous operators. In Topics in Mathematical Physics (M. S. Birman, ed.). Spectral Theory 3 73-78. English version published by Consultants Bureau, New York. · Zbl 0195.13701
[44] Srebro, N., Rennie, J. and Jaakkola, T. (2005). Maximum margin matrix factorization. In Advances in Neural Information Processing Systems 17 (L. Saul, Y. Weiss and L. Bottou, eds.) 1329-1336. MIT Press, Cambridge, MA.
[45] Srebro, N. and Shraibman, A. (2005). Rank, trace-norm and max-norm. In Learning Theory, Proceedings of COLT-2005. Lecture Notes in Comput. Sci. 3559 545-560. Springer, Berlin. · Zbl 1137.68563
[46] Tropp, J. A. (2010). User-friendly tail bounds for sums of random matrices. Available at . · Zbl 1259.60008
[47] Tsybakov, A. (2009). Introduction to Nonparametric Estimation . Springer, New York. · Zbl 1176.62032
[48] Tsybakov, A. and van de Geer, S. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203-1224. · Zbl 1080.62047
[49] van de Geer, S. (2000). Empirical Processes in M-estimation . Cambridge Univ. Press, Cambridge. · Zbl 1179.62073
[50] Vershynin, R. (2007). Some problems in asymptotic convex geometry and random matrices motivated by numerical algorithms. In Banach Spaces and Their Applications in Analysis (B. Randrianantoanina and N. Randrianantoanina, eds.) 209-218. de Gruyter, Berlin. · Zbl 1173.90476
[51] Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.