# zbMATH — the first resource for mathematics

Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach. (English) Zbl 1434.90161
Summary: We study the problem of recovery of matrices that are simultaneously low rank and row and/or column sparse. Such matrices appear in recent applications in cognitive neuroscience, imaging, computer vision, macroeconomics, and genetics. We propose a GDT (Gradient Descent with hard Thresholding) algorithm to efficiently recover matrices with such structure, by minimizing a bi-convex function over a nonconvex set of constraints. We show linear convergence of the iterates obtained by GDT to a region within statistical error of an optimal solution. As an application of our method, we consider multi-task learning problems and show that the statistical error rate obtained by GDT is near optimal compared to minimax rate. Experiments demonstrate competitive performance and much faster running speed compared to existing methods, on both simulations and real data sets.
##### MSC:
 90C26 Nonconvex programming, global optimization 90C90 Applications of mathematical programming
##### Software:
 [1] Aaronson, S. (2007). The learnability of quantum states., Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 463 3089-3114. · Zbl 1140.81317 [2] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions., The Annals of Statistics 40 1171-1197. · Zbl 1274.62219 [3] Akerboom, J., Chen, T. W., Wardill, T. J., Tian, L., Marvin, J. S., Mutlu, S., Calderon, N. C., Esposti, F., Borghuis, B. G., Sun, X. R., Gordus, A., Orger, M. B., Portugues, R., Engert, F., Macklin, J. J., Filosa, A., Aggarwal, A., Kerr, R. A., Takagi, R., Kracun, S., Shigetomi, E., Khakh, B. S., Baier, H., Lagnado, L., Wang, S. S. H., Bargmann, C. I., Kimmel, B. E., Jayaraman, V., Svoboda, K., Kim, D. S., Schreiter, E. R. and Looger, L. L. (2012). Optimization of a GCaMP calcium indicator for neural activity imaging., Journal of Neuroscience 32 13819-13840. [4] Aldrin, M. (1996). Moderate projection pursuit regression for multivariate response data., Computational Statistics & Data Analysis 21 501-531. · Zbl 0900.62334 [5] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components., Ann. Statist. 37 2877-2921. · Zbl 1173.62049 [6] Amit, Y., Fink, M., Srebro, N. and Ullman, S. (2007). Uncovering shared structures in multiclass classification. In, Proceedings of the 24th International Conference on Machine Learning 17-24. ACM. [7] Ando, R. K. and Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data., Journal of Machine Learning Research 6 1817-1853. · Zbl 1222.68133 [8] Argyriou, A., Evgeniou, T. and Pontil, M. (2008). Convex multi-task feature learning., Machine Learning 73 243-272. [9] Bahadori, M. T., Zheng, Z., Liu, Y. and Lv, J. (2016). Scalable Interpretable Multi-Response Regression via SEED., Technical report. [10] Balakrishnan, S., Kolar, M., Rinaldo, A. and Singh, A. (2017). Recovering block-structured activations using compressive measurements., Electron. J. Statist. 11 2647-2678. · Zbl 1366.62034 [11] Balakrishnan, S., Kolar, M., Rinaldo, A., Singh, A. and Wasserman, L. (2011). Statistical and computational tradeoffs in biclustering. In, NIPS 2011 Workshop on Computational Trade-offs in Statistical Learning 4. [12] Balcan, M., Feldman, V. and Szepesvári, C., eds. (2014)., Proceedings of the 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014. JMLR Workshop and Conference Proceedings 35. JMLR.org. [13] Barber, R. F. and Ha, W. (2017). Gradient Descent with Nonconvex Constraints: Local Concavity Determines Convergence., Technical report. · Zbl 07127820 [14] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension., Ann. Statist. 41 1780-1815. · Zbl 1277.62155 [15] Bertsekas, D. P. and Tsitsiklis, J. N. (1995). Neuro-dynamic programming: an overview. In, Proceedings of 1995 34th IEEE Conference on Decision and Control 1 560-564 vol.1. [16] Bhojanapalli, S., Kyrillidis, A. and Sanghavi, S. (2016). Dropping Convexity for Faster Semi-definite Optimization. In, 29th Annual Conference on Learning Theory (V. Feldman, A. Rakhlin and O. Shamir, eds.). Proceedings of Machine Learning Research 49 530-582. PMLR, Columbia University, New York, New York, USA. [17] Bhojanapalli, S., Neyshabur, B. and Srebro, N. (2016). Global Optimality of Local Search for Low Rank Matrix Recovery. In, Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon and R. Garnett, eds.) 3873-3881. Curran Associates, Inc. [18] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data., Ann. Statist. 41 1055-1084. · Zbl 1292.62071 [19] Bunea, F., She, Y. and Wegkamp, M. H. (2011). Optimal selection of reduced rank estimators of high-dimensional matrices., The Annals of Statistics 39 1282-1309. · Zbl 1216.62086 [20] Bunea, F., She, Y. and Wegkamp, M. H. (2012). Joint variable and rank selection for parsimonious estimation of high-dimensional matrices., Ann. Statist. 40 2359-2388. · Zbl 1373.62246 [21] Burer, S. and Monteiro, R. D. C. (2003). A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization., Math. Program. 95 329-357. Computational semidefinite and second order cone programming: the state of the art. · Zbl 1030.90077 [22] Burer, S. and Monteiro, R. D. C. (2005). Local minima and convergence in low-rank semidefinite programming., Math. Program. 103 427-444. · Zbl 1099.90040 [23] Cai, J.-F., Candès, E. J. and Shen, Z. (2010). A singular value thresholding algorithm for matrix completion., SIAM J. Optim. 20 1956-1982. · Zbl 1201.90155 [24] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: optimal rates and adaptive estimation., Ann. Statist. 41 3074-3110. · Zbl 1288.62099 [25] Cai, T. T. and Zhang, A. (2015). ROP: matrix recovery via rank-one projections., Ann. Statist. 43 102-138. · Zbl 1308.62120 [26] Calandriello, D., Lazaric, A. and Restelli, M. (2014). Sparse Multi-Task Reinforcement Learning. In, Advances in Neural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence and K. Q. Weinberger, eds.) 819-827. Curran Associates, Inc. [27] Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis?, J. ACM 58 Art. 11, 37. · Zbl 1327.62369 [28] Candes, E. J. and Plan, Y. (2010). Matrix completion with noise., Proceedings of the IEEE 98 925-936. [29] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization., Found. Comput. Math. 9 717-772. · Zbl 1219.90124 [30] Candès, E. J. and Tao, T. (2010). The power of convex relaxation: near-optimal matrix completion., IEEE Trans. Inform. Theory 56 2053-2080. · Zbl 1366.15021 [31] Cao, J., Gu, C. and Wang, Y. (2020). Principal Component and Static Factor Analysis. In, Macroeconomic Forecasting in the Era of Big Data 229-266. Springer. [32] Caruana, R. (1997). Multitask learning., Machine Learning 28 41-75. [33] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A. and Willsky, A. S. (2011). Rank-sparsity incoherence for matrix decomposition., SIAM J. Optim. 21 572-596. · Zbl 1226.90067 [34] Chapelle, O., Shivaswamy, P., Vadrevu, S., Weinberger, K., Zhang, Y. and Tseng, B. (2010). Multi-Task Learning for Boosting with Application to Web Search Ranking. In, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1189-1198. ACM. [35] Charisopoulos, V., Chen, Y., Davis, D., Díaz, M., Ding, L. and Drusvyatskiy, D. (2019). Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence., arXiv preprint arXiv:1904.10020. [36] Chen, J., Zhou, J. and Ye, J. (2011). Integrating Low-Rank and Group-Sparse Structures for Robust Multi-Task Learning. In, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 42-50. ACM. [37] Chen, K., Chan, K.-S. and Stenseth, N. C. (2011). Reduced rank stochastic regression with a sparse singular value decomposition., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 203-221. · Zbl 1411.62182 [38] Chen, L. and Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection., Journal of the American Statistical Association 107 1533-1545. · Zbl 1258.62075 [39] Chen, Y. (2015). Incoherence-optimal matrix completion., IEEE Trans. Inform. Theory 61 2909-2923. · Zbl 1359.15022 [40] Chen, Y., Bhojanapalli, S., Sanghavi, S. and Ward, R. (2014). Coherent Matrix Completion. In, Proceedings of the 31st International Conference on Machine Learning (E. P. Xing and T. Jebara, eds.). Proceedings of Machine Learning Research 32 674-682. PMLR, Bejing, China. [41] Chen, Y., Jalali, A., Sanghavi, S. and Caramanis, C. (2013). Low-rank matrix recovery from errors and erasures., IEEE Transactions on Information Theory 59 4324-4337. [42] Chen, Y. and Wainwright, M. J. (2015). Fast Low-Rank Estimation by Projected Gradient Descent: General Statistical and Algorithmic Guarantees., Technical report. [43] Chen, Y.-L., Kolar, M. and Tsay, R. S. (2019). Tensor Canonical Correlation Analysis., arXiv preprint arXiv:1906.05358. [44] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. (2011). Natural language processing (almost) from scratch., Journal of Machine Learning Research 12 2493-2537. · Zbl 1280.68161 [45] Davenport, M. A. and Romberg, J. (2016). An overview of low-rank matrix recovery from incomplete observations., IEEE Journal of Selected Topics in Signal Processing 10 608-622. [46] Ding, L. and Chen, Y. (2018). The leave-one-out approach for matrix completion: Primal and dual analysis., arXiv preprint arXiv:1803.07554. [47] Ernst, D., Geurts, P. and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning., Journal of Machine Learning Research 6 503-556. · Zbl 1222.68193 [48] Evgeniou, T. and Pontil, M. (2004). Regularized Multi-Task Learning. In, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 109-117. ACM. [49] Fan, J., Ding, L., Chen, Y. and Udell, M. (2019). Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery. In, Advances in Neural Information Processing Systems 5105-5115. [50] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space., Journal of the Royal Statistical Society: Series B 70 849-911. · Zbl 1411.62187 [51] Fazel, M., Hindi, H. and Boyd, S. P. (2001). A Rank Minimization Heuristic with Application to Minimum Order System Approximation. In, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148). IEEE. [52] Fazel, M., Hindi, H. and Boyd, S. (2004). Rank Minimization and Applications in System Theory. In, Proceedings of the 2004 American Control Conference 4 3273-3278. [53] Ge, R., Lee, J. D. and Ma, T. (2016). Matrix completion has no spurious local minimum. In, Advances in Neural Information Processing Systems 2973-2981. [54] Gemulla, R., Nijkamp, E., Haas, P. J. and Sismanis, Y. (2011). Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. In, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’11. ACM Press. [55] Gross, D. (2011). Recovering low-rank matrices from few coefficients in any basis., IEEE Trans. Inform. Theory 57 1548-1566. · Zbl 1366.94103 [56] Gu, Q., Wang, Z. W. and Liu, H. (2016). Low-Rank and Sparse Structure Pursuit via Alternating Minimization. In, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (A. Gretton and C. C. Robert, eds.). Proceedings of Machine Learning Research 51 600-609. PMLR, Cadiz, Spain. [57] Ha, W. and Barber, R. F. (2017). Alternating Minimization and Alternating Descent over Nonconvex Sets., Technical report. [58] Ha, W., Liu, H. and Barber, R. F. (2018). An equivalence between stationary points for rank constraints versus low-rank factorizations., arXiv preprint arXiv:1812.00404. [59] Haeffele, B., Young, E. and Vidal, R. (2014). Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing. In, Proceedings of the 31st International Conference on Machine Learning (ICML-14) (T. Jebara and E. P. Xing, eds.) 2007-2015. JMLR Workshop and Conference Proceedings. [60] Hahn, P. R., He, J. and Lopes, H. (2018). Bayesian factor model shrinkage for linear IV regression with many instruments., Journal of Business & Economic Statistics 36 278-287. [61] Harchaoui, Z., Douze, M., Paulin, M., Dudik, M. and Malick, J. (2012). Large-Scale Image Classification with Trace-Norm Regularization. In, 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. [62] Hardt, M. (2014). Understanding Alternating Minimization for Matrix Completion. In, 55th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2014 651-660. IEEE Computer Soc., Los Alamitos, CA. [63] Hardt, M., Meka, R., Raghavendra, P. and Weitz, B. (2014). Computational Limits for Matrix Completion. In, Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014 (M. Balcan, V. Feldman and C. Szepesvári, eds.). JMLR Workshop and Conference Proceedings 35 703-725. JMLR.org. [64] Hardt, M. and Wootters, M. (2014). Fast Matrix Completion Without the Condition Number. In, Proceedings of the 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014 (M. Balcan, V. Feldman and C. Szepesvári, eds.). JMLR Workshop and Conference Proceedings 35 638-678. JMLR.org. [65] Hastie, T., Mazumder, R., Lee, J. D. and Zadeh, R. (2015). Matrix completion and low-rank SVD via fast alternating least squares., Journal of Machine Learning Research 16 3367-3402. · Zbl 1352.65117 [66] Hsieh, C.-J. and Olsen, P. (2014). Nuclear Norm Minimization via Active Subspace Selection. In, Proceedings of the 31st International Conference on Machine Learning (E. P. Xing and T. Jebara, eds.). Proceedings of Machine Learning Research 32 575-583. PMLR, Bejing, China. [67] Hsu, D., Kakade, S. M. and Zhang, T. (2011). Robust matrix decomposition with sparse corruptions., IEEE Trans. Inform. Theory 57 7221-7234. · Zbl 1365.15018 [68] Huang, J. and Zhang, T. (2010). The benefit of group sparsity., Annals of Statistics 38 1978-2004. · Zbl 1202.62052 [69] Izenman, A. J. (2008)., Modern multivariate statistical techniques. Springer Texts in Statistics. Springer, New York; Regression, classification, and manifold learning. [70] Jain, P., Meka, R. and Dhillon, I. S. (2010). Guaranteed Rank Minimization via Singular Value Projection. In, Advances in Neural Information Processing Systems 23 (J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds.) 937-945. Curran Associates, Inc. [71] Jain, P. and Netrapalli, P. (2015). Fast Exact Matrix Completion with Finite Samples. In, Proceedings of the 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015 (P. Grünwald, E. Hazan and S. Kale, eds.). JMLR Workshop and Conference Proceedings 40 1007-1034. JMLR.org. [72] Jain, P., Netrapalli, P. and Sanghavi, S. (2013). Low-Rank Matrix Completion Using Alternating Minimization (Extended Abstract). In, STOC’13—Proceedings of the 2013 ACM Symposium on Theory of Computing 665-674. ACM, New York. · Zbl 1293.65073 [73] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions., Journal of the American Statistical Association 104 682-693. · Zbl 1388.62174 [74] Ke, Z. T. and Wang, M. (2017). A new SVD approach to optimal topic estimation., arXiv preprint arXiv:1704.07016. [75] Keshavan, R. H., Montanari, A. and Oh, S. (2010). Matrix completion from a few entries., IEEE Transactions on Information Theory 56 2980-2998. · Zbl 1366.62111 [76] Keshavan, R. H., Montanari, A. and Oh, S. (2010). Matrix completion from noisy entries., Journal of Machine Learning Research 11 2057-2078. · Zbl 1242.62069 [77] Kim, S. and Xing, E. P. (2010). Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity. In, Proceedings of the 27th International Conference on Machine Learning (ICML-10) 543-550. [78] Kolar, M., Balakrishnan, S., Rinaldo, A. and Singh, A. (2011). Minimax Localization of Structural Information in Large Noisy Matrices. In, Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira and K. Q. Weinberger, eds.) 909-917. [79] Kolar, M., Lafferty, J. D. and Wasserman, L. A. (2011). Union support recovery in multi-task learning., Journal of Machine Learning Research 12 2415-2435. · Zbl 1280.62047 [80] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion., Annals of Statistics 39 2302-2329. · Zbl 1231.62097 [81] Koren, Y., Bell, R. and Volinsky, C. (2009). Matrix factorization techniques for recommender systems., Computer 42 30-37. [82] Lapin, M., Schiele, B. and Hein, M. (2014). Scalable Multitask Representation Learning for Scene Classification. In, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1434-1441. [83] Lazaric, A. and Ghavamzadeh, M. (2010). Bayesian Multi-Task Reinforcement Learning. In, Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel (J. Fürnkranz and T. Joachims, eds.) 599-606. Omnipress. [84] Lee, K. and Bresler, Y. (2010). ADMiRA: atomic decomposition for minimum rank approximation., IEEE Trans. Inform. Theory 56 4402-4416. · Zbl 1366.94112 [85] Lee, M., Shen, H., Huang, J. Z. and Marron, J. S. (2010). Biclustering via sparse singular value decomposition., Biometrics 66 1087-1095. · Zbl 1233.62182 [86] Li, Q., Zhu, Z. and Tang, G. (2017). Geometry of Factored Nuclear Norm Regularization., Technical report. [87] Li, X., Zhao, T., Arora, R., Liu, H. and Haupt, J. (2016). Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning. In, Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. ICML’16 917-925. JMLR.org. [88] Liu, H. and Barber, R. F. (2018). Between hard and soft thresholding: optimal iterative thresholding algorithms., arXiv preprint arXiv:1804.08841. [89] Liu, Z. and Vandenberghe, L. (2009). Interior-point method for nuclear norm approximation with application to system identification., SIAM J. Matrix Anal. Appl. 31 1235-1256. · Zbl 1201.90151 [90] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. A. (2011). Oracle inequalities and optimal inference under group sparsity., Annals of Statistics 39 2164-204. · Zbl 1306.62156 [91] Ma, X., Xiao, L. and Wong, W. H. (2014). Learning regulatory programs by threshold SVD regression., Proceedings of the National Academy of Sciences 111 15675-15680. [92] Ma, Z. (2013). Sparse principal component analysis and iterative thresholding., Ann. Statist. 41 772-801. · Zbl 1267.62074 [93] Ma, Z., Ma, Z. and Sun, T. (2014). Adaptive Estimation in Two-way Sparse Reduced-rank Regression., Technical report. [94] Mei, S., Bai, Y., Montanari, A. et al. (2018). The landscape of empirical risk for nonconvex losses., The Annals of Statistics 46 2747-2774. · Zbl 1409.62117 [95] Mei, S., Cao, B. and Sun, J. (2012). Encoding Low-Rank and Sparse Structures Simultaneously in Multi-Task Learning. Technical Report, Microsoft Technical, Report. [96] Munos, R. and Szepesvári, C. (2008). Finite-time bounds for fitted value iteration., Journal of Machine Learning Research 9 815-857. · Zbl 1225.68203 [97] Na, S., Kolar, M. and Koyejo, O. (2019). Estimating differential latent variable graphical models with applications to brain connectivity., arXiv preprint arXiv:1909.05892. [98] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling., Annals of Statistics 39 1069-1097. · Zbl 1216.62090 [99] Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: optimal bounds with noise., Journal of Machine Learning Research 13 1665-1697. · Zbl 1436.62204 [100] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $$M$$-estimators with decomposable regularizers., Statistical Science 27 538-557. · Zbl 1331.62350 [101] Nesterov, Y. (2013)., Introductory Lectures on Convex Optimization. Springer US. · Zbl 1086.90045 [102] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression., Annals of Statistics 39 1-47. · Zbl 1373.62372 [103] Park, D., Kyrillidis, A., Caramanis, C. and Sanghavi, S. (2016). Finding low-rank solutions to matrix problems, efficiently and provably., arXiv preprint arXiv:1606.03168. · Zbl 1419.90065 [104] Pnevmatikakis, E. A., Gao, Y., Soudry, D., Pfau, D., Lacefield, C., Poskanzer, K., Bruno, R., Yuste, R. and Paninski, L. (2014). A Structured Matrix Factorization Framework for Large Scale Calcium Imaging Data Analysis., Technical report. [105] Recht, B. (2011). A simpler approach to matrix completion., Journal of Machine Learning Research 12 3413-3430. · Zbl 1280.68141 [106] Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization., SIAM Rev. 52 471-501. · Zbl 1198.90321 [107] Recht, B. and Ré, C. (2013). Parallel stochastic gradient algorithms for large-scale matrix completion., Math. Program. Comput. 5 201-226. · Zbl 1275.90039 [108] Richard, E., andre Savalle, P. and Vayatis, N. (2012). Estimation of Simultaneously Sparse and Low Rank Matrices. In, Proceedings of the 29th International Conference on Machine Learning (ICML-12) (J. Langford and J. Pineau, eds.) 1351-1358. ACM, New York, NY, USA. [109] Rohde, A. and Tsybakov, A. B. (2011). Estimation of high-dimensional low-rank matrices., Ann. Statist. 39 887-930. · Zbl 1215.62056 [110] Seltzer, M. L. and Droppo, J. (2013). Multi-Task Learning in Deep Neural Networks for Improved Phoneme Recognition. In, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013 6965-6969. IEEE. [111] She, Y. (2017). Selective factor extraction in high dimensions., Biometrika 104 97-110. · Zbl 07072184 [112] She, Y. and Tran, H. (2019). On cross-validation for sparse reduced rank regression., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81 145-161. · Zbl 1407.62195 [113] Snel, M. and Whiteson, S. (2012). Multi-Task Reinforcement Learning: Shaping and Feature Selection. In, Lecture Notes in Computer Science 237-248. Springer, Berlin, Heidelberg. [114] Srebro, N., Rennie, J. and Jaakkola, T. S. (2005). Maximum-Margin Matrix Factorization. In, Advances in Neural Information Processing Systems 17 (L. K. Saul, Y. Weiss and L. Bottou, eds.) 1329-1336. MIT Press. [115] Sun, R. and Luo, Z.-Q. (2016). Guaranteed matrix completion via non-convex factorization., IEEE Trans. Inform. Theory 62 6535-6579. · Zbl 1359.94179 [116] Sutton, R. S. and Barto, A. G. (1998)., Introduction to Reinforcement Learning, 1st ed. MIT Press, Cambridge, MA, USA. · Zbl 1407.68009 [117] Takács, G., Pilászy, I., Németh, B. and Tikk, D. (2007). Major components of the gravity recommendation system., ACM SIGKDD Explorations Newsletter 9 80. [118] Tibshirani, R. J. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society: Series B 58 267-288. · Zbl 0850.62538 [119] Tu, S., Boczar, R., Simchowitz, M., Soltanolkotabi, M. and Recht, B. (2016). Low-Rank Solutions of Linear Matrix Equations via Procrustes Flow. In, Proceedings of The 33rd International Conference on Machine Learning (M. F. Balcan and K. Q. Weinberger, eds.). Proceedings of Machine Learning Research 48 964-973. PMLR, New York, New York, USA. [120] Turlach, B. A., Venables, W. N. and Wright, S. J. (2005). Simultaneous variable selection., Technometrics 47 349-363. [121] Uematsu, Y., Fan, Y., Chen, K., Lv, J. and Lin, W. (2017). SOFAR: Large-Scale Association Network Learning., Technical report. · Zbl 1432.68402 [122] Vogelstein, J. T., Packer, A. M., Machado, T. A., Sippy, T., Babadi, B., Yuste, R. and Paninski, L. (2010). Fast nonnegative deconvolution for spike train inference from population calcium imaging., Journal of Neurophysiology 104 3691-3704. [123] Vounou, M., Janousova, E., Wolz, R., Stein, J. L., Thompson, P. M., Rueckert, D. and Montana, G. (2012). Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer’s disease., NeuroImage 60 700-716. [124] Vu, V. (2011). Singular vectors under random perturbation., Random Struct. Alg. 39 526-538. · Zbl 1242.65069 [125] Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions., Ann. Statist. 41 2905-2947. · Zbl 1288.62103 [126] Wang, J., Kolar, M. and Srebro, N. (2016). Distributed Multi-Task Learning with Shared Representation., Technical report. [127] Wang, J., Kolar, M. and Srerbo, N. (2016). Distributed Multi-Task Learning. In, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (A. Gretton and C. C. Robert, eds.). Proceedings of Machine Learning Research 51 751-760. PMLR, Cadiz, Spain. [128] Wang, Z., Lai, M.-J., Lu, Z., Fan, W., Davulcu, H. and Ye, J. (2015). Orthogonal rank-one matrix pursuit for low rank matrix completion., SIAM J. Sci. Comput. 37 A488-A514. · Zbl 1315.65044 [129] Wang, Z., Lu, H. and Liu, H. (2014). Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time., Technical report. [130] Weinberger, K., Dasgupta, A., Langford, J., Smola, A. and Attenberg, J. (2009). Feature Hashing for Large Scale Multitask Learning. In, Proceedings of the 26th Annual International Conference on Machine Learning 1113-1120. ACM. [131] Wilson, A., Fern, A., Ray, S. and Tadepalli, P. (2007). Multi-task reinforcement learning. In, Proceedings of the 24th international conference on Machine learning - ICML ’07. ACM Press. [132] Xiang, S., Zhu, Y., Shen, X. and Ye, J. (2012). Optimal Exact Least Squares Rank Minimization. In, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’12. ACM Press. [133] Xu, P., Ma, J. and Gu, Q. (2017). Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations., Technical report. [134] Xue, Y., Liao, X., Carin, L. and Krishnapuram, B. (2007). Multi-task learning for classification with Dirichlet process priors., Journal of Machine Learning Research 8 35-63. · Zbl 1222.68338 [135] Yan, Q., Ye, J. and Shen, X. (2015). Simultaneous pursuit of sparseness and rank structures for matrix decomposition., Journal of Machine Learning Research 16 47-75. · Zbl 1360.62368 [136] Yang, D., Ma, Z. and Buja, A. (2014). A sparse singular value decomposition method for high-dimensional data., J. Comput. Graph. Statist. 23 923-942. [137] Yu, M., Gupta, V. and Kolar, M. (2016). Statistical Inference for Pairwise Graphical Models Using Score Matching. In, Advances in Neural Information Processing Systems 29. Curran Associates, Inc. [138] Yu, M., Gupta, V. and Kolar, M. (2017). An Influence-Receptivity Model for Topic Based Information Cascades., Technical report. [139] Yu, M., Thompson, A. M., Ramamurthy, K. N., Yang, E. and Lozano, A. C. (2017). Multitask Learning using Task Clustering with Applications to Predictive Modeling and GWAS of Plant Varieties., Technical report. [140] Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression., Journal of the Royal Statistical Society: Series B 69 329-346. [141] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., Journal of the Royal Statistical Society: Series B 68 49-67. · Zbl 1141.62030 [142] Yuan, X.-T. and Zhang, T. (2013). Truncated power method for sparse eigenvalue problems., Journal of Machine Learning Research 14 899-925. · Zbl 1320.62141 [143] Zhang, X., Wang, L. and Gu, Q. (2017). A Nonconvex Free Lunch for Low-Rank plus Sparse Matrix Recovery., Technical report. [144] Zhao, T., Wang, Z. and Liu, H. (2015). Nonconvex low rank matrix factorization via inexact first order oracle., Advances in Neural Information Processing Systems. [145] Zheng, Q. and Lafferty, J. D. (2015). A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements. In, Advances in Neural Information Processing Systems 28 (C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama and R. Garnett, eds.) 109-117. Curran Associates, Inc. [146] Zhou, J., Liu, J., Narayan, V. A. and Ye, J. (2013). Modeling disease progression via multi-task learning., NeuroImage 78 233-248. [147] Zhu, Y., Shen, X. and Ye, C. (2016). Personalized prediction and sparsity pursuit in latent factor models., J. Amer. Statist. Assoc. 111 241-252. [148] Zhu, Z., Li, Q., Tang, G. and Wakin, M. B. (2017). Global Optimality in Low-rank Matrix Optimization., Technical report. · Zbl 1414.90297 [149] Zhu, Z., Li, Q., Tang, G. and Wakin, M. B. (2017). The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing., Technical report. [150] Zhuang, Y., Chin, W.-S., Juan, Y.-C. and Lin, C.-J. (2013). A Fast Parallel SGD for Matrix Factorization in Shared Memory Systems. In, Proceedings of the 7th ACM Conference on Recommender Systems - RecSys ’13. ACM Press.