×

Estimating leverage scores via rank revealing methods and randomization. (English) Zbl 1472.62082

MSC:

62H12 Estimation in multivariate analysis
60B20 Random matrices (probabilistic aspects)
65F08 Preconditioners for iterative methods
68W20 Randomized algorithms
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] D. Achlioptas, Database-friendly random projections, in Proc. 20th ACM PODS, 2001, pp. 274-281.
[2] N. Ailon and B. Chazelle, Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform, in Proc. 38th ACM STOC, 2006, pp. 557-563. · Zbl 1301.68232
[3] N. Ailon and E. Liberty, Fast dimension reduction using Rademacher series on dual BCH codes, Discrete Comput. Geom., 42 (2009), pp. 615-630. · Zbl 1180.94083
[4] A. Alaoui and M. W. Mahoney, Fast randomized kernel ridge regression with statistical guarantees, in Advances in Neural Information Processing Systems, 2015, pp. 775-783.
[5] H. Avron and C. Boutsidis, Faster subset selection for matrices and applications, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 1464-1499, https://doi.org/10.1137/120867287. · Zbl 1425.65059
[6] H. Avron, K. L. Clarkson, and D. P. Woodruff, Faster kernel ridge regression using sketching and preconditioning, SIAM J. Matrix Anal. Appl., 38 (2017), pp. 1116-1138, https://doi.org/10.1137/16M1105396. · Zbl 1379.65008
[7] H. Avron, P. Maymounkov, and S. Toledo, Blendenpik: Supercharging LAPACK’s least-squares solver, SIAM J. Sci. Comput., 32 (2010), pp. 1217-1236, https://doi.org/10.1137/090767911. · Zbl 1213.65069
[8] C. Bekas, A. Curioni, and I. Fedulova, Low-cost data uncertainty quantification, Concurrency and Computation: Practice and Experience, 24 (2012), pp. 908-920.
[9] C. Bekas, E. Kokiopoulou, and Y. Saad, An estimator for the diagonal of a matrix, Appl. Numer. Math., 57 (2007), pp. 1214-1229. · Zbl 1123.65026
[10] M. W. Berry, S. A. Pulatova, and G. Stewart, Algorithm 844: Computing sparse reduced-rank approximations to sparse matrices, ACM Trans. Math. Software, 31 (2005), pp. 252-269. · Zbl 1070.65539
[11] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65-98, https://doi.org/10.1137/141000671. · Zbl 1356.68030
[12] S. Bhojanapalli, P. Jain, and S. Sanghavi, Tighter low-rank approximation via sampling the leveraged element, in Proc. 2015 Annual ACM-SIAM SODA, 2015, pp. 902-920, https://doi.org/10.1137/1.9781611973730.62. · Zbl 1371.68320
[13] C. H. Bischof and G. Quintana-Ortí, Computing rank-revealing QR factorizations of dense matrices, ACM Trans. Math. Software, 24 (1998), pp. 226-253. · Zbl 0932.65033
[14] J. Bourgain, S. Dirksen, and J. Nelson, Toward a unified theory of sparse dimensionality reduction in euclidean space, Geom. Funct. Anal., 25 (2015), pp. 1009-1088. · Zbl 1341.46007
[15] C. Boutsidis, M. W. Mahoney, and P. Drineas, An improved approximation algorithm for the column subset selection problem, in Proc. 2009 ACM-SIAM SODA, 2009, pp. 968-977, https://doi.org/10.1137/1.9781611973068.105. · Zbl 1420.68235
[16] P. Businger and G. H. Golub, Linear least squares solutions by Householder transformations, Numer. Math., 7 (1965), pp. 269-276. · Zbl 0142.11503
[17] E. Candes and J. Romberg, Sparsity and incoherence in compressive sampling, Inverse Problems, 23 (2007), pp. 969-985. · Zbl 1120.94005
[18] S. Chandrasekaran and I. C. F. Ipsen, On rank-revealing factorisations, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 592-622, https://doi.org/10.1137/S0895479891223781. · Zbl 0796.65030
[19] M. Charikar, K. Chen, and M. Farach-Colton, Finding frequent items in data streams, in Proc. 29th ICALP, Springer, 2002, pp. 693-703. · Zbl 1057.68600
[20] S. Chatterjee and A. S. Hadi, Regression Analysis by Example, John Wiley & Sons, 2015. · Zbl 1263.62099
[21] Y. Chen, S. Bhojanapalli, S. Sanghavi, and R. Ward, Completing any low-rank matrix, provably, J. Mach. Learn. Res., 16 (2015), pp. 2999-3034. · Zbl 1351.62107
[22] H. Y. Cheung, T. C. Kwok, and L. C. Lau, Fast matrix rank algorithms and applications, J. ACM, 60 (2013), pp. 1-25. · Zbl 1281.90039
[23] H. Chin and P. Liang, An empirical evaluation of sketched SVD and its application to leverage score ordering, Proc. Mach. Learn. Res., 95 (2018), pp. 799-814.
[24] A. Civril, Column subset selection problem is UG-hard, J. Comput. System Sci., 80 (2014), pp. 849-859. · Zbl 1285.68052
[25] K. L. Clarkson and D. P. Woodruff, Low-rank approximation and regression in input sparsity time, J. ACM, 63 (2017), 54. · Zbl 1426.65057
[26] M. B. Cohen, Nearly tight oblivious subspace embeddings by trace inequalities, in Proc. 2016 ACM-SIAM SODA, 2016, pp. 278-287, https://doi.org/10.1137/1.9781611974331.ch21. · Zbl 1415.65106
[27] M. B. Cohen, Y. T. Lee, C. Musco, C. Musco, R. Peng, and A. Sidford, Uniform sampling for matrix approximation, in Proc. 2015 Conf. Innov. Theor. Comput. Sci., ACM Press, New York, 2015, pp. 181-190. · Zbl 1365.65108
[28] M. B. Cohen, C. Musco, and C. Musco, Input sparsity time low-rank approximation via ridge leverage score sampling, in Proc. 2017 ACM-SIAM SODA, 2017, pp. 1758-1777, https://epubs.siam.org/doi/abs/10.1137/1.9781611974782.115. · Zbl 1410.68399
[29] Y. Dahiya, D. Konomis, and D. P. Woodruff, An empirical evaluation of sketching for numerical linear algebra, in Proc. 24th ACM SIGKDD, 2018, pp. 1292-1300.
[30] A. Dasgupta, R. Kumar, and T. Sarlós, A sparse Johnson: Lindenstrauss transform, in Proc. 42nd ACM STOC, 2010, pp. 341-350. · Zbl 1293.68140
[31] K. R. Davidson and S. J. Szarek, Local operator theory, random matrices and Banach spaces, Handb. Geom. Banach Spaces, 1 (2001), p. 317-366. · Zbl 1067.46008
[32] T. A. Davis, Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization, ACM Trans. Math. Software, 38 (2011), pp. 1-22. · Zbl 1365.65122
[33] J. W. Demmel, L. Grigori, M. Gu, and H. Xiang, Communication avoiding rank revealing QR factorization with column pivoting, SIAM J. Matrix Anal. Appl., 36 (2015), pp. 55-89, https://doi.org/10.1137/13092157X. · Zbl 1327.65078
[34] A. Deshpande and S. Vempala, Adaptive sampling and fast low-rank matrix approximation, in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, Springer, 2006, pp. 292-303. · Zbl 1155.68575
[35] P. Drineas, I. C. F. Ipsen, E.-M. Kontopoulou, and M. Magdon-Ismail, Structural convergence results for approximation of dominant subspaces from block Krylov spaces, SIAM J. Matrix Anal. Appl., 39 (2018), pp. 567-586, https://doi.org/10.1137/16M1091745. · Zbl 1390.15027
[36] P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff, Fast approximation of matrix coherence and statistical leverage, J. Mach. Learn. Res., 13 (2012), pp. 3475-3506. · Zbl 1437.65030
[37] P. Drineas, M. Mahoney, S. Muthukrishnan, and T. Sarlós, Faster least squares approximation, Numer. Math., 117 (2010), pp. 219-249, https://doi.org/10.1007/s00211-010-0331-6. · Zbl 1218.65037
[38] D. Durfee, J. Peebles, R. Peng, and A. B. Rao, Determinant-preserving sparsification of SDDM matrices with applications to counting and sampling spanning trees, in Proc. 58th IEEE FOCS, IEEE, 2017, pp. 926-937.
[39] A. Gittens and M. W. Mahoney, Revisiting the Nyström method for improved large-scale machine learning, Proc. Mach. Learn. Res., 28 (2013), pp. 567-575. · Zbl 1367.68223
[40] G. Golub and C. Van Loan, Matrix Computations, Johns Hopkins Stud. Math. Sci., Johns Hopkins University Press, 2013.
[41] M. Gu and S. C. Eisenstat, Efficient algorithms for computing a strong rank-revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848-869, https://doi.org/10.1137/0917055. · Zbl 0858.65044
[42] D. C. Hoaglin and R. E. Welsch, The hat matrix in regression and ANOVA, Amer. Statist., 32 (1978), pp. 17-22. · Zbl 0375.62070
[43] J. T. Holodnak, I. C. F. Ipsen, and T. Wentworth, Conditioning of leverage scores and computation by QR decomposition, SIAM J. Matrix Anal. Appl., 36 (2015), pp. 1143-1163, https://doi.org/10.1137/140988541. · Zbl 1321.65070
[44] P. Indyk and R. Motwani, Approximate nearest neighbors: Towards removing the curse of dimensionality, in Proc. 30th ACM STOC, 1998, pp. 604-613. · Zbl 1029.68541
[45] I. C. F. Ipsen and T. Wentworth, The effect of coherence on sampling from matrices with orthonormal columns, and preconditioned least squares problems, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 1490-1520, https://doi.org/10.1137/120870748. · Zbl 1359.65063
[46] G. Jindal, P. Kolev, R. Peng, and S. Sawlani, Density independent algorithms for sparsifying k-step random walks, in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. · Zbl 1467.05255
[47] W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, Contemp. Math., 26 (1984), pp. 189-206. · Zbl 0539.46017
[48] V. Kalantzis, C. Bekas, A. Curioni, and E. Gallopoulos, Accelerating data uncertainty quantification by solving linear systems with multiple right-hand sides, Numer. Algorithms, 62 (2013), pp. 637-653. · Zbl 1263.65009
[49] V. Kalantzis, A. C. I. Malossi, C. Bekas, A. Curioni, E. Gallopoulos, and Y. Saad, A scalable iterative dense linear system solver for multiple right-hand sides in data analytics, Parallel Comput., 74 (2018), pp. 136-153, https://doi.org/10.1016/j.parco.2017.12.005.
[50] D. M. Kane and J. Nelson, Sparser Johnson-Lindenstrauss transforms, J. ACM, 61 (2014), pp. 1-23. · Zbl 1295.68134
[51] R. Kannan and S. Vempala, Randomized algorithms in numerical linear algebra, Acta Numer., 26 (2017), p. 95. · Zbl 1378.65084
[52] M. Kapralov, Y. T. Lee, C. N. Musco, C. P. Musco, and A. Sidford, Single pass spectral sparsification in dynamic streams, SIAM J. Comput., 46 (2017), pp. 456-477, https://doi.org/10.1137/141002281. · Zbl 1359.68295
[53] M. Kapralov, V. Potluru, and D. Woodruff, How to fake multiply by a Gaussian matrix, in Proc. 33rd ICML, 2016, pp. 2101-2110.
[54] M. Li, G. L. Miller, and R. Peng, Iterative row sampling, in Proc. 54th IEEE FOCS, 2013, pp. 127-136.
[55] Y. Liu and S. Zhang, Fast quantum algorithms for least squares regression and statistic leverage scores, Theoret. Comput. Sci., 657 (2017), pp. 38-47. · Zbl 1356.68078
[56] M. Lopes, S. Wang, and M. Mahoney, Error estimation for randomized least-squares algorithms via the bootstrap, in Proc. ICML, 2018, pp. 3223-3232.
[57] M. Magdon-Ismail, Row Sampling for Matrix Algorithms via a Non-commutative Bernstein Bound, preprint, https://arxiv.org/abs/1008.0587, 2010.
[58] M. W. Mahoney, Randomized algorithms for matrices and data, Adv. Mach. Learn. Data Mining Astronomy, 1 (2012), pp. 647-672.
[59] P.-G. Martinsson and J. Tropp, Randomized Numerical Linear Algebra: Foundations & Algorithms, preprint, https://arxiv.org/abs/2002.01387, 2020.
[60] X. Meng and M. W. Mahoney, Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression, in Proc. 45th ACM STOC, 2013, pp. 91-100. · Zbl 1293.68150
[61] X. Meng, M. A. Saunders, and M. W. Mahoney, LSRN: A parallel iterative solver for strongly over- or underdetermined systems, SIAM J. Sci. Comput., 36 (2014), pp. C95-C118, https://doi.org/10.1137/120866580. · Zbl 1298.65053
[62] C. Musco, P. Netrapalli, A. Sidford, S. Ubaru, and D. P. Woodruff, Spectrum approximation beyond fast matrix multiplication: Algorithms and hardness, in 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. · Zbl 1462.68082
[63] J. Nelson and H. L. Nguyên, OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings, in Proc. IEEE FOCS, 2013, pp. 117-126.
[64] J. Nelson and H. L. Nguyên, Sparsity lower bounds for dimensionality reducing maps, in Proc. 45th ACM STOC, 2013, pp. 101-110. · Zbl 1294.68086
[65] M. E. Newman, A measure of betweenness centrality based on random walks, Social Networks, 27 (2005), pp. 39-54.
[66] B. Ordozgoiti, S. G. Canaval, and A. Mozo, Probabilistic leverage scores for parallelized unsupervised feature selection, in Advances in Computational Intelligence, Proc. 14th International Work-Conference on Artificial Neural Networks, Part II, I. Rojas, G. Joya, and A. Catala, eds., Springer, Cham, 2017, pp. 722-733.
[67] D. Papailiopoulos, A. Kyrillidis, and C. Boutsidis, Provable deterministic leverage score sampling, in Proc. 20th ACM SIGKDD, 2014, pp. 997-1006.
[68] D. J. Perry and R. T. Whitaker, Augmented Leverage Score Sampling with Bounds, in Proc. ECML PKDD 2016, Part II, P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken, eds., Springer, Cham, 2016, pp. 543-558.
[69] G. Quintana-Ortí, X. Sun, and C. H. Bischof, A BLAS-\(3\) version of the QR factorization with column pivoting, SIAM J. Sci. Comput., 19 (1998), pp. 1486-1494, https://doi.org/10.1137/S1064827595296732. · Zbl 0912.65034
[70] V. Rokhlin and M. Tygert, A fast randomized algorithm for overdetermined linear least-squares regression, Proc. Natl. Acad. Sci. USA, 105 (2008), pp. 13212-13217. · Zbl 05851018
[71] T. Sarlós, Improved approximation algorithms for large matrices via random projections, in Proc. 47th IEEE FOCS, IEEE, 2006, pp. 143-152.
[72] B. Saunders, A. Storjohann, and G. Villard, Matrix rank certification, Electron. J. Linear Algebra, 11 (2004), pp. 16-23. · Zbl 1038.65038
[73] Y. Shitov, Column subset selection is NP-complete, Linear Algebra Appl., 610 (2021), pp. 52-58. · Zbl 1456.68056
[74] D. A. Spielman and N. Srivastava, Graph sparsification by effective resistances, in Proc. 40th ACM STOC, 2008, pp. 563-568. · Zbl 1231.68184
[75] G. W. Stewart, Four algorithms for the the efficient computation of truncated pivoted QR approximations to a sparse matrix, Numer. Math., 83 (1999), pp. 313-323. · Zbl 0957.65031
[76] A. Talwalkar and A. Rostamizadeh, Matrix coherence and the Nyström method, in Proc. 26th UAI, AUAI Press, 2010, pp. 572-579.
[77] A. Torralba, R. Fergus, and W. T. Freeman, 80 million tiny images: A large data set for nonparametric object and scene recognition, IEEE Trans. Pattern Anal. Mach. Intell., 30 (2008), pp. 1958-1970.
[78] J. A. Tropp, Improved analysis of the subsampled randomized Hadamard transform, Adv. Adaptive Data Anal., 3 (2011), pp. 115-126. · Zbl 1232.15029
[79] S. Ubaru and Y. Saad, Fast methods for estimating the numerical rank of large matrices, in Proc. ICML, 2016, pp. 468-477.
[80] M. Udell and A. Townsend, Why are big data matrices approximately low rank?, SIAM J. Math. Data Sci., 1 (2019), pp. 144-160, https://doi.org/10.1137/18M1183480.
[81] P. F. Velleman and R. E. Welsch, Efficient computing of regression diagnostics, Amer. Statist., 35 (1981), pp. 234-242. · Zbl 0475.65099
[82] D. P. Woodruff, Sketching as a tool for numerical linear algebra, Found. Trends Theor. Comput. Sci., 10 (2014), p. 1-157. · Zbl 1316.65046
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.