×

Practical sketching algorithms for low-rank matrix approximation. (English) Zbl 1379.65026

Summary: This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image, or sketch, of the matrix. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.

MSC:

65F30 Other matrix algorithms (MSC2010)

Software:

Algorithm 971
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] N. Ailon and B. Chazelle, {\it Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform}, in STOC’06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, ACM, New York, 2006, pp. 557-563, . · Zbl 1301.68232
[2] N. Ailon and B. Chazelle, {\it The fast Johnson-Lindenstrauss transform and approximate nearest neighbors}, SIAM J. Comput., 39 (2009), pp. 302-322, . · Zbl 1185.68327
[3] J. Bourgain, S. Dirksen, and J. Nelson, {\it Toward a unified theory of sparse dimensionality reduction in Euclidean space}, Geom. Funct. Anal., 25 (2015), pp. 1009-1088, . · Zbl 1341.46007
[4] C. Boutsidis, D. Garber, Z. Karnin, and E. Liberty, {\it Online principal components analysis}, in Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, 2015, pp. 887-901. · Zbl 1373.62291
[5] C. Boutsidis and A. Gittens, {\it Improved matrix algorithms via the subsampled randomized Hadamard transform}, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 1301-1340, . · Zbl 1286.65054
[6] C. Boutsidis, D. Woodruff, and P. Zhong, {\it Optimal principal component analysis in distributed and streaming models}, in Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC 2016), Cambridge, MA, 2016. · Zbl 1381.62140
[7] S. Boyd and L. Vandenberghe, {\it Convex optimization}, Cambridge University Press, Cambridge, 2004, . · Zbl 1058.90049
[8] K. Clarkson and D. Woodruff, {\it Low-rank PSD approximation in input-sparsity time}, Unpublished, Jan. 2017. · Zbl 1411.68186
[9] K. L. Clarkson and D. P. Woodruff, {\it Numerical linear algebra in the streaming model}, in Proceedings of the 41st ACM Symposium on Theory of Computing (STOC), ACM, New York, 2009, pp. 205-214. · Zbl 1304.65138
[10] K. L. Clarkson and D. P. Woodruff, {\it Low rank approximation and regression in input sparsity time}, in STOC’13–Proceedings of the 2013 ACM Symposium on Theory of Computing, ACM, New York, 2013, pp. 81-90, . · Zbl 1293.65069
[11] M. Cohen, {\it Nearly tight oblivious subspace embeddings by trace inequalities}, in Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, ACM, New York, 2016, pp. 278-287. · Zbl 1415.65106
[12] M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu, {\it Dimensionality reduction for k-means clustering and low rank approximation}, in Proceedings of the 47th Annual ACM Symposium on Theory of Computing, ACM, New York, 2015, pp. 163-172. · Zbl 1321.68398
[13] M. B. Cohen, J. Nelson, and D. P. Woodruff, {\it Optimal approximate matrix product in terms of stable rank}, in the 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), I. Chatzigiannakis, M. Mitzenmacher, Y. Rabani, and D. Sangiorgi, eds., Leibniz International Proceedings in Informatics (LIPIcs) 55, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 2016, pp. 11:1-11:14, . · Zbl 1404.65032
[14] J. Demmel, I. Dumitriu, and O. Holtz, {\it Fast linear algebra is stable}, Numer. Math., 108 (2007), pp. 59-91, . · Zbl 1133.65015
[15] D. Feldman, M. Schmidt, and C. Sohler, {\it Turning big data into tiny data: Constant-size coresets for \(k\)-means, PCA and projective clustering}, in Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, ACM, New York, 2012, pp. 1434-1453. · Zbl 1421.68219
[16] D. Feldman, M. Volkov, and D. Rus, {\it Dimensionality reduction of massive sparse datasets using coresets}, in Advances in Neural Information Processing Systems 29 (NIPS 2016), Curran Associates, 2016, pp 2766-2774.
[17] A. Frieze, R. Kannan, and S. Vempala, {\it Fast Monte-Carlo algorithms for finding low-rank approximations}, J. ACM, 51 (2004), pp. 1025-1041, . · Zbl 1125.65005
[18] M. Gu, {\it Subspace iteration randomization and singular value problems}, SIAM J. Sci. Comput., 37 (2015), pp. A1139-A1173, .
[19] N. Halko, P. G. Martinsson, and J. A. Tropp, {\it Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions}, SIAM Rev., 53 (2011), pp. 217-288, . · Zbl 1269.65043
[20] N. J. Higham, {\it Matrix nearness problems and applications}, in Applications of Matrix Theory (Bradford, 1988), Oxford University Press, New York, 1989, pp. 1-27.
[21] P. Jain, C. Jin, S. M. Kakade, P. Netrapalli, and A. Sidford, {\it Streaming PCA: Matching matrix Bernstein and near-optimal finite sample guarantees for Oja’s algorithm}, in 29th Annual Conference on Learning Theory, Columbia University, New York, 2016, pp. 1147-1164.
[22] H. Li, G. C. Linderman, A. Szlam, K. P. Stanton, Y. Kluger, and M. Tygert, {\it Algorithm 971: An implementation of a randomized algorithm for principal component analysis}, ACM Trans. Math. Softw., 43 (2017), pp. 28:1-28:14, . · Zbl 1391.65085
[23] Y. Li, H. L. Nguyen, and D. P. Woodruff, {\it Turnstile streaming algorithms might as well be linear sketches}, in STOC’14–Proceedings of the 2014 ACM Symposium on Theory of Computing, ACM, New York, 2014, pp. 174-183. · Zbl 1315.68281
[24] E. Liberty, {\it Accelerated Dense Random Projections}, Ph.D. thesis, Yale University, New Haven, CT, 2009.
[25] M. W. Mahoney, {\it Randomized algorithms for matrices and data}, Found. Trends Machine Learn., 3 (2011), pp. 123-224. · Zbl 1232.68173
[26] P.-G. Martinsson, V. Rokhlin, and M. Tygert, {\it A randomized algorithm for the decomposition of matrices}, Appl. Comput. Harmon. Anal., 30 (2011), pp. 47-68, . · Zbl 1210.65095
[27] X. Meng and M. W. Mahoney, {\it Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression}, in STOC’13–Proceedings of the 2013 ACM Symposium on Theory of Computing, ACM, New York, 2013, pp. 91-100, . · Zbl 1293.68150
[28] J. Nelson and H. L. Nguyen, {\it OSNAP: faster numerical linear algebra algorithms via sparser subspace embeddings}, in 2013 IEEE 54th Annual Symposium on Foundations of Computer Science–FOCS 2013, IEEE Computer Soc., Los Alamitos, CA, 2013, pp. 117-126, .
[29] J. Nelson and H. L. Nguyen, {\it Lower bounds for oblivious subspace embeddings}, in Automata, Languages, and Programming. Part I, Lecture Notes in Comput. Sci. 8572, Springer, Heidelberg, 2014, pp. 883-894, . · Zbl 1398.68202
[30] C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala, {\it Latent semantic indexing: a probabilistic analysis}, J. Comput. System Sci., 61 (2000), pp. 217-235, . Special issue on the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Seattle, WA, 1998). · Zbl 0963.68063
[31] J. A. Tropp, {\it Improved analysis of the subsampled randomized Hadamard transform}, Adv. Adapt. Data Anal., 3 (2011), pp. 115-126, . · Zbl 1232.15029
[32] J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher, {\it Randomized Single-View Algorithms for Low-Rank Matrix Approximation}, ACM Report 2017-01, Caltech, Pasadena, CA, 2017, . · Zbl 1379.65026
[33] J. Upadhyay, {\it Fast and Space-Optimal Low-Rank Factorization in the Streaming Model with Application in Differential privacy}, preprint, , 2016.
[34] D. P. Woodruff, {\it Sketching as a tool for numerical linear algebra}, Found. Trends Theoret. Comput. Sci., 10 (2014), pp. iv+157. · Zbl 1316.65046
[35] F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert, {\it A fast randomized algorithm for the approximation of matrices}, Appl. Comput. Harmon. Anal., 25 (2008), pp. 335-366. · Zbl 1155.65035
[36] A. Yurtsever, M. Udell, J. A. Tropp, and V. Cevher, {\it Sketchy decisions: Convex low-rank matrix optimization with optimal storage}, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, 2017, . · Zbl 1379.65026
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.