##
**The singular value decomposition: anatomy of optimizing an algorithm for extreme scale.**
*(English)*
Zbl 1410.65114

In this publication, the authors survey the evolution of SVD algorithms for dense matrices, discussing the motivation and performance impacts of changes. Two main branches of dense SVD methods: bidiagonalization and Jacobi, are mentioned.

Bidiagonalization methods started with the implementation by Golub and Reinsch in Algol60, which was subsequently ported to Fortran in the EISPACK library, and was later more efficiently implemented in the LINPACK library, targeting contemporary vector machines. To address cache-based memory hierarchies, the SVD algorithm was reformulated to use Level 3 BLAS in the LAPACK library. To address new architectures, ScaLAPACK was introduced to take advantage of distributed computing, and MAGMA was developed for accelerators such as GPUs. Algorithmically, the divide and conquer and MRRR algorithms were developed to reduce the number of operations. Still, these methods remained memory bound, so two-stage algorithms were developed to reduce memory operations and increase the computational intensity, with efficient implementations in PLASMA, DPLASMA, and MAGMA.

Jacobi methods started with the two-sided method of Kogbetliantz and the one-sided method of Hestenes. They have likewise had many developments, including parallel and block versions and preconditioning to improve convergence.

In this paper, the authors investigate the impact of these changes by testing various historical and current implementations on a common, modern multicore machine and a distributed computing platform. They show that algorithmic and implementation improvements had increased the speed of the SVD by several orders of magnitude, while using up to 40 times less energy.

Bidiagonalization methods started with the implementation by Golub and Reinsch in Algol60, which was subsequently ported to Fortran in the EISPACK library, and was later more efficiently implemented in the LINPACK library, targeting contemporary vector machines. To address cache-based memory hierarchies, the SVD algorithm was reformulated to use Level 3 BLAS in the LAPACK library. To address new architectures, ScaLAPACK was introduced to take advantage of distributed computing, and MAGMA was developed for accelerators such as GPUs. Algorithmically, the divide and conquer and MRRR algorithms were developed to reduce the number of operations. Still, these methods remained memory bound, so two-stage algorithms were developed to reduce memory operations and increase the computational intensity, with efficient implementations in PLASMA, DPLASMA, and MAGMA.

Jacobi methods started with the two-sided method of Kogbetliantz and the one-sided method of Hestenes. They have likewise had many developments, including parallel and block versions and preconditioning to improve convergence.

In this paper, the authors investigate the impact of these changes by testing various historical and current implementations on a common, modern multicore machine and a distributed computing platform. They show that algorithmic and implementation improvements had increased the speed of the SVD by several orders of magnitude, while using up to 40 times less energy.

Reviewer: Wei-Ru Xu (Chengdu)

### MSC:

65F15 | Numerical computation of eigenvalues and eigenvectors of matrices |

65Y15 | Packaged methods for numerical algorithms |

15A18 | Eigenvalues, singular values, and eigenvectors |

15A23 | Factorization of matrices |

65Y05 | Parallel numerical computation |

### Keywords:

singular value decomposition; SVD; bidiagonal matrix; QR iteration; divide and conquer; bisection; MRRR; Jacobi method; Kogbetliantz method; Hestenes method### Software:

MKL; SLATE; OpenBLAS; CUDA; ESSL; SBR Toolbox; Algorithm 977; ATLAS; ScaLAPACK; BDSVDX; LINPACK; Matlab; DAGuE; EISPACK; XMR; BLAS; LAPACK
PDF
BibTeX
XML
Cite

\textit{J. Dongarra} et al., SIAM Rev. 60, No. 4, 808--865 (2018; Zbl 1410.65114)

Full Text:
DOI

### References:

[1] | E. Agullo, B. Hadri, H. Ltaief, and J. Dongarrra, Comparative study of one-sided factorizations with multiple software packages on multi-core hardware, in Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis (SC’09), ACM, 2009, art. 20, . |

[2] | E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users’ Guide, 3rd ed., SIAM, Philadelphia, 1999, . · Zbl 0934.65030 |

[3] | H. Andrews and C. Patterson, Singular value decomposition (SVD) image coding, IEEE Trans. Commun., 24 (1976), pp. 425–432, . |

[4] | P. Arbenz and G. H. Golub, On the spectral decomposition of Hermitian matrices modified by low rank perturbations with applications, SIAM J. Matrix Anal. Appl., 9 (1988), pp. 40–58, . · Zbl 0646.65033 |

[5] | P. Arbenz and I. Slapničar, An analysis of parallel implementations of the block-Jacobi algorithm for computing the SVD, in Proceedings of the 17th International Conference on Information Technology Interfaces ITI, 1995, pp. 13–16, . |

[6] | G. Ballard, J. Demmel, and N. Knight, Avoiding communication in successive band reduction, ACM Trans. Parallel Comput., 1 (2015), p. 11, . |

[7] | J. L. Barlow, More accurate bidiagonal reduction for computing the singular value decomposition, SIAM J. Matrix Anal. Appl., 23 (2002), pp. 761–798, . · Zbl 1007.65026 |

[8] | M. Bečka, G. Okša, and M. Vajteršic, Dynamic ordering for a parallel block-Jacobi SVD algorithm, Parallel Comput., 28 (2002), pp. 243–262, . |

[9] | M. Bečka, G. Okša, and M. Vajteršic, New dynamic orderings for the parallel one–sided block-Jacobi SVD algorithm, Parallel Process. Lett., 25 (2015), art. 1550003, . |

[10] | M. Bečka, G. Okša, M. Vajteršic, and L. Grigori, On iterative QR pre-processing in the parallel block-Jacobi SVD algorithm, Parallel Comput., 36 (2010), pp. 297–307, . |

[11] | M. Bečka and M. Vajteršic, Block-Jacobi SVD algorithms for distributed memory systems I: Hypercubes and rings, Parallel Algorithms Appl., 13 (1999), pp. 265–287, . |

[12] | M. Bečka and M. Vajteršic, Block-Jacobi SVD algorithms for distributed memory systems II: Meshes, Parallel Algorithms Appl., 14 (1999), pp. 37–56, . |

[13] | C. Bischof, B. Lang, and X. Sun, Algorithm 807: The SBR Toolbox—software for successive band reduction, ACM Trans. Math. Software, 26 (2000), pp. 602–616, . · Zbl 1365.65104 |

[14] | C. Bischof and C. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 2–13, . · Zbl 0628.65033 |

[15] | C. H. Bischof, Computing the singular value decomposition on a distributed system of vector processors, Parallel Comput., 11 (1989), pp. 171–186, . · Zbl 0676.65029 |

[16] | L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet et al., ScaLAPACK Users’ Guide, SIAM, Philadelphia, 1997, . |

[17] | L. S. Blackford, A. Petitet, R. Pozo, K. Remington, R. C. Whaley, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry et al., An updated set of basic linear algebra subprograms (BLAS), ACM Trans. Math. Software, 28 (2002), pp. 135–151, . |

[18] | G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemarinier, H. Ltaief, et al., Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA, in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Ph.D. Forum (IPDPSW), IEEE, 2011, pp. 1432–1441, . |

[19] | G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra, DAGuE: A generic distributed DAG engine for high performance computing, Parallel Comput., 38 (2012), pp. 37–51, . |

[20] | W. H. Boukaram, G. Turkiyyah, H. Ltaief, and D. E. Keyes, Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression, Parallel Comput., 74 (2017), pp. 19–33, . |

[21] | R. P. Brent and F. T. Luk, The solution of singular-value and symmetric eigenvalue problems on multiprocessor arrays, SIAM J. Sci. Statist. Comput., 6 (1985), pp. 69–84, . · Zbl 0575.65027 |

[22] | R. P. Brent, F. T. Luk, and C. Van Loan, Computation of the singular value decomposition using mesh-connected processors, J. VLSI Comput. Syst., 1 (1985), pp. 242–270, . · Zbl 0597.65029 |

[23] | T. F. Chan, An improved algorithm for computing the singular value decomposition, ACM Trans. Math. Software, 8 (1982), pp. 72–83, . · Zbl 0477.65032 |

[24] | J. Choi, J. Dongarra, and D. W. Walker, The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form, Numer. Algorithms, 10 (1995), pp. 379–399, . · Zbl 0839.65050 |

[25] | J. J. M. Cuppen, A divide and conquer method for the symmetric tridiagonal eigenproblem, Numer. Math., 36 (1980), pp. 177–195, . · Zbl 0431.65022 |

[26] | P. I. Davies and N. J. Higham, Numerically stable generation of correlation matrices and their factors, BIT, 40 (2000), pp. 640–651, . · Zbl 0969.65036 |

[27] | P. P. M. de Rijk, A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 359–371, . · Zbl 0667.65035 |

[28] | S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, J. Amer. Soc. Inform. Sci., 41 (1990), pp. 391–407, . |

[29] | J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-optimal parallel and sequential QR and LU factorizations, SIAM J. Sci. Comput., 34 (2012), pp. A206–A239, . · Zbl 1241.65028 |

[30] | J. Demmel, M. Gu, S. Eisenstat, I. Slapničar, K. Veselić, and Z. Drmač, Computing the singular value decomposition with high relative accuracy, Linear Algebra Appl., 299 (1999), pp. 21–80, . |

[31] | J. Demmel and W. Kahan, Accurate singular values of bidiagonal matrices, SIAM J. Sci. Statist. Comput., 11 (1990), pp. 873–912, . · Zbl 0705.65027 |

[32] | J. Demmel and K. Veselić, Jacobi’s method is more accurate than QR, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1204–1245, . · Zbl 0759.65011 |

[33] | J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, 1997, . · Zbl 0879.65017 |

[34] | J. W. Demmel, I. Dhillon, and H. Ren, On the correctness of some bisection-like parallel eigenvalue algorithms in floating point arithmetic, Electron. Trans. Numer. Anal., 3 (1995), pp. 116–149, . · Zbl 0860.65026 |

[35] | I. S. Dhillon, A New O(\(n^2\)) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem, Ph.D. thesis, EECS Department, University of California, Berkeley, 1997, . |

[36] | I. S. Dhillon and B. N. Parlett, Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices, Linear Algebra Appl., 387 (2004), pp. 1–28, . · Zbl 1055.65048 |

[37] | I. S. Dhillon and B. N. Parlett, Orthogonal eigenvectors and relative gaps, SIAM J. Matrix Anal. Appl., 25 (2004), pp. 858–899, . · Zbl 1068.65046 |

[38] | I. S. Dhillon, B. N. Parlett, and C. Vömel, The design and implementation of the MRRR algorithm, ACM Trans. Math. Software, 32 (2006), pp. 533–560, . · Zbl 1230.65046 |

[39] | J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, LINPACK Users’ Guide, SIAM, Philadelphia, 1979, . |

[40] | J. Dongarra, J. Du Croz, S. Hammarling, and I. S. Duff, A set of level \(3\) basic linear algebra subprograms, ACM Trans. Math. Software, 16 (1990), pp. 1–17, . · Zbl 0900.65115 |

[41] | J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, An extended set of FORTRAN basic linear algebra subprograms, ACM Trans. Math. Software, 14 (1988), pp. 1–17, . · Zbl 0639.65016 |

[42] | J. Dongarra, D. C. Sorensen, and S. J. Hammarling, Block reduction of matrices to condensed forms for eigenvalue computations, J. Comput. Appl. Math., 27 (1989), pp. 215–227, . · Zbl 0679.65025 |

[43] | Z. Drmač, Algorithm 977: A QR-preconditioned QR SVD method for computing the SVD with high accuracy, ACM Trans. Math. Software, 44 (2017), p. 11, . · Zbl 06920073 |

[44] | Z. Drmač and K. Veselić, New fast and accurate Jacobi SVD algorithm, I, SIAM J. Matrix Anal. Appl., 29 (2008), pp. 1322–1342, . |

[45] | Z. Drmač and K. Veselić, New fast and accurate Jacobi SVD algorithm, II, SIAM J. Matrix Anal. Appl., 29 (2008), pp. 1343–1362, . |

[46] | P. Eberlein, On one-sided Jacobi methods for parallel computation, SIAM J. Algebraic Discrete Methods, 8 (1987), pp. 790–796, . · Zbl 0653.65026 |

[47] | Eigen, Eigen 3.3.3, 2017, . |

[48] | K. V. Fernando and B. N. Parlett, Accurate singular values and differential qd algorithms, Numer. Math., 67 (1994), pp. 191–229, . · Zbl 0814.65036 |

[49] | G. E. Forsythe and P. Henrici, The cyclic Jacobi method for computing the principal values of a complex matrix, Trans. Amer. Math. Soc., 94 (1960), pp. 1–23, . · Zbl 0092.32504 |

[50] | B. S. Garbow, J. M. Boyle, C. B. Moler, and J. Dongarra, Matrix eigensystem routines – EISPACK guide extension, Lecture Notes in Comput. Sci. 51, Springer, Berlin, 1977, . · Zbl 0368.65020 |

[51] | M. Gates, S. Tomov, and J. Dongarraa, Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs, Parallel Comput., 74 (2018), pp. 3–18, . |

[52] | G. Golub, Some modified matrix eigenvalue problems, SIAM Rev., 15 (1973), pp. 318–334, . · Zbl 0254.65027 |

[53] | G. Golub and W. Kahan, Calculating the singular values and pseudo-inverse of a matrix, J. Soc. Indust. Appl. Math. Ser. B Numer. Anal., 2 (1965), pp. 205–224, . · Zbl 0194.18201 |

[54] | G. Golub and C. Reinsch, Singular value decomposition and least squares solutions, Numer. Math., 14 (1970), pp. 403–420, . · Zbl 0181.17602 |

[55] | B. Großer and B. Lang, Efficient parallel reduction to bidiagonal form, Parallel Comput., 25 (1999), pp. 969–986, . · Zbl 1062.65503 |

[56] | M. Gu, J. Demmel, and I. Dhillon, Efficient Computation of the Singular Value Decomposition with Applications to Least Squares Problems, Tech. Report LBL-36201, Lawrence Berkeley Laboratory, 1994, . |

[57] | M. Gu and S. C. Eisenstat, A Divide and Conquer Algorithm for the Bidiagonal SVD, Tech. Report YALEU/DCS/TR-933, Department of Computer Science, Yale University, 1992, . · Zbl 0821.65019 |

[58] | M. Gu and S. C. Eisenstat, A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 1266–1276, . · Zbl 0807.65029 |

[59] | M. Gu and S. C. Eisenstat, A divide-and-conquer algorithm for the bidiagonal SVD, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 79–92, . · Zbl 0821.65019 |

[60] | A. Haidar, J. Kurzak, and P. Luszczek, An improved parallel singular value algorithm and its implementation for multicore hardware, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’13), ACM, 2013, art. 90, . |

[61] | A. Haidar, H. Ltaief, and J. Dongarra, Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels, in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11), ACM, 2011, art. 8, . |

[62] | A. Haidar, H. Ltaief, P. Luszczek, and J. Dongarra, A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction, in 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2012, pp. 25–35, . |

[63] | S. Hammarling, A note on modifications to the Givens plane rotation, IMA J. Appl. Math., 13 (1974), pp. 215–218, . · Zbl 0278.65037 |

[64] | V. Hari, Accelerating the SVD block-Jacobi method, Computing, 75 (2005), pp. 27–53, . · Zbl 1079.65046 |

[65] | V. Hari and J. Matejaš, Accuracy of two SVD algorithms for \(2× 2\) triangular matrices, Appl. Math. Comput., 210 (2009), pp. 232–257, . · Zbl 1166.65014 |

[66] | V. Hari and K. Veselić, On Jacobi methods for singular value decompositions, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 741–754, . · Zbl 0627.65039 |

[67] | M. Heath, A. Laub, C. Paige, and R. Ward, Computing the singular value decomposition of a product of two matrices, SIAM J. Sci. Statist. Comput., 7 (1986), pp. 1147–1159, . · Zbl 0607.65013 |

[68] | M. R. Hestenes, Inversion of matrices by biorthogonalization and related results, J. Soc. Indust. Appl. Math., 6 (1958), pp. 51–90, . · Zbl 0085.33003 |

[69] | G. W. Howell, J. W. Demmel, C. T. Fulton, S. Hammarling, and K. Marmol, Cache efficient bidiagonalization using BLAS \(2.5\) operators, ACM Trans. Math. Software, 34 (2008), art. 14, . · Zbl 1190.65056 |

[70] | IBM Corporation, ESSL Guide and Reference, 2016, . |

[71] | Intel Corporation, User’s Guide for Intel Math Kernel Library for Linux OS, 2015, . |

[72] | I. C. F. Ipsen, Computing an eigenvector with inverse iteration, SIAM Rev., 39 (2006), pp. 254–291, . · Zbl 0874.65029 |

[73] | C. G. J. Jacobi, Über ein leichtes verfahren die in der theorie der säcularstörungen vorkommenden gleichungen numerisch aufzulösen, J. Reine Angew. Math., 30 (1846), pp. 51–94, . |

[74] | E. Jessup and D. Sorensen, A divide and conquer algorithm for computing the singular value decomposition, in Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, 1989, SIAM, Philadelphia, pp. 61–66. |

[75] | W. Kahan, Accurate Eigenvalues of a Symmetric Tri-diagonal Matrix, Tech. Report, Stanford University, Stanford, CA, 1966, . |

[76] | E. Kogbetliantz, Solution of linear equations by diagonalization of coefficients matrix, Quart. Appl. Math., 13 (1955), pp. 123–132, . · Zbl 0066.10101 |

[77] | J. Kurzak, P. Wu, M. Gates, I. Yamazaki, P. Luszczek, G. Ragghianti, and J. Dongarra, Designing SLATE: Software for Linear Algebra Targeting Exascale, SLATE Working Note 3, Innovative Computing Laboratory, University of Tennessee, 2017, . |

[78] | B. Lang, Parallel reduction of banded matrices to bidiagonal form, Parallel Comput., 22 (1996), pp. 1–18, . · Zbl 0873.65044 |

[79] | C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic linear algebra subprograms for FORTRAN usage, ACM Trans. Math. Software, 5 (1979), pp. 308–323, . · Zbl 0412.65022 |

[80] | R.-C. Li, Solving Secular Equations Stably and Efficiently, Tech. Report UCB//CSD-94-851, Computer Science Division, University of California Berkeley, 1994, . Also: LAPACK Working Note 89. |

[81] | S. Li, M. Gu, L. Cheng, X. Chi, and M. Sun, An accelerated divide-and-conquer algorithm for the bidiagonal SVD problem, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 1038–1057, . · Zbl 1305.65127 |

[82] | H. Ltaief, J. Kurzak, and J. Dongarra, Parallel two-sided matrix reduction to band bidiagonal form on multicore architectures, IEEE Trans. Parallel Distrib. Syst., 21 (2010), pp. 417–423, . |

[83] | H. Ltaief, P. Luszczek, and J. Dongarra, High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures, ACM Trans. Math. Software, 39 (2013), art. 16, . · Zbl 1295.65145 |

[84] | F. T. Luk, Computing the singular-value decomposition on the ILLIAC IV, ACM Trans. Math. Software, 6 (1980), pp. 524–539, . · Zbl 0471.65019 |

[85] | F. T. Luk and H. Park, On parallel Jacobi orderings, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 18–26, . · Zbl 0666.65031 |

[86] | O. Marques and P. B. Vasconcelos, Computing the bidiagonal SVD through an associated tridiagonal eigenproblem, in International Conference on Vector and Parallel Processing (VECPAR), Springer, 2016, pp. 64–74, . |

[87] | W. F. Mascarenhas, On the convergence of the Jacobi method for arbitrary orderings, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 1197–1209, . · Zbl 0839.65045 |

[88] | J. Matejaš and V. Hari, Accuracy of the Kogbetliantz method for scaled diagonally dominant triangular matrices, Appl. Math. Comput., 217 (2010), pp. 3726–3746, . · Zbl 1206.65138 |

[89] | J. Matejaš and V. Hari, On high relative accuracy of the Kogbetliantz method, Linear Algebra Appl., 464 (2015), pp. 100–129, . · Zbl 1302.65093 |

[90] | MathWorks, MATLAB, 2017, . |

[91] | J. D. McCalpin, A survey of memory bandwidth and machine balance in current high performance computers, IEEE Comput. Soc. Tech. Committee Comput. Architect. (TCCA) Newslett., 19 (1995), pp. 19–25, . |

[92] | B. Moore, Principal component analysis in linear systems: Controllability, observability, and model reduction, IEEE Trans. Automat. Control, 26 (1981), pp. 17–32, . · Zbl 0464.93022 |

[93] | MPI Forum, MPI: A Message-Passing Interface Standard, Version 3.1, June 2015, . |

[94] | NVIDIA Corporation, CUDA Toolkit 7.0, March 2015, . |

[95] | G. Okša and M. Vajteršic, Efficient pre-processing in the parallel block-Jacobi SVD algorithm, Parallel Comput., 32 (2006), pp. 166–176, . |

[96] | OpenBLAS, OpenBLAS User Manual, 2016, . |

[97] | B. N. Parlett, The new QD algorithms, Acta Numer., 4 (1995), pp. 459–491, . · Zbl 0835.65059 |

[98] | B. N. Parlett and I. S. Dhillon, Fernando’s solution to Wilkinson’s problem: An application of double factorization, Linear Algebra Appl., 267 (1997), pp. 247–279, . · Zbl 0886.65033 |

[99] | V. Rokhlin, A. Szlam, and M. Tygert, A randomized algorithm for principal component analysis, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 1100–1124, . · Zbl 1198.65035 |

[100] | H. Rutishauser, Der quotienten-differenzen-algorithmus, Z. Angew. Math. Phys., 5 (1954), pp. 233–251, . · Zbl 0055.34702 |

[101] | H. Rutishauser, Solution of eigenvalue problems with the LR-transformation, Nat. Bur. Standards Appl. Math. Ser., 49 (1958), pp. 47–81. · Zbl 0123.11303 |

[102] | H. Rutishauser, The Jacobi method for real symmetric matrices, in Handbook for Automatic Computation: Volume II: Linear Algebra, Grundlehren Math. Wiss. 186, Springer-Verlag, New York, 1971, pp. 202–211, . |

[103] | A. H. Sameh, On Jacobi and Jacobi-like algorithms for a parallel computer, Math. Comp., 25 (1971), pp. 579–590, . · Zbl 0222.65046 |

[104] | R. Schreiber and C. Van Loan, A storage-efficient WY representation for products of Householder transformations, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 53–57, . · Zbl 0664.65025 |

[105] | B. T. Smith, J. M. Boyle, J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler, Matrix Eigensystem Routines – EISPACK Guide, Second Edition, Lecture Notes in Comput. Sci. 6, Springer, Berlin, 1976, . · Zbl 0325.65016 |

[106] | G. W. Stewart, The efficient generation of random orthogonal matrices with an application to condition estimators, SIAM J. Numer. Anal., 17 (1980), pp. 403–409, . · Zbl 0443.65027 |

[107] | G. W. Stewart, On the early history of the singular value decomposition, SIAM Rev., 35 (1993), pp. 551–566, . · Zbl 0799.01016 |

[108] | G. W. Stewart, QR Sometimes Beats Jacobi, Tech. Report CS-TR-3434, University of Maryland, 1995, . |

[109] | S. Tomov, R. Nath, and J. Dongarra, Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, Parallel Comput., 36 (2010), pp. 645–654, . · Zbl 1214.65020 |

[110] | S. Tomov, R. Nath, H. Ltaief, and J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, in 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), IEEE, 2010, pp. 1–8, . |

[111] | M. A. Turk and A. P. Pentland, Face recognition using eigenfaces, in Proceedings of 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 1991, pp. 586–591, . |

[112] | C. Van Loan, The Block Jacobi Method for Computing the Singular Value Decomposition, Tech. Report TR 85-680, Cornell University, 1985, . · Zbl 0611.65020 |

[113] | F. G. Van Zee, R. A. Van de Geijn, and G. Quintana-Ortí, Restructuring the tridiagonal and bidiagonal QR algorithms for performance, ACM Trans. Math. Software, 40 (2014), p. 18, . · Zbl 1322.65051 |

[114] | F. G. Van Zee, R. A. Van De Geijn, G. Quintana-Ortí, and G. J. Elizondo, Families of algorithms for reducing a matrix to condensed form, ACM Trans. Math. Software, 39 (2012), art. 2, . · Zbl 1295.65052 |

[115] | R. C. Whaley and J. Dongarra, Automatically tuned linear algebra software, in Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, IEEE Computer Society, 1998, pp. 1–27, . |

[116] | J. H. Wilkinson, Note on the quadratic convergence of the cyclic Jacobi process, Numer. Math., 4 (1962), pp. 296–300, . · Zbl 0104.34501 |

[117] | J. H. Wilkinson and C. Reinsch, Handbook for Automatic Computation: Volume II: Linear Algebra, Grundlehren Math. Wiss. 186, Springer-Verlag, New York, 1971, . |

[118] | P. R. Willems and B. Lang, A framework for the \(MR^3\) algorithm: Theory and implementation, SIAM J. Sci. Comput., 35 (2013), pp. A740–A766, . |

[119] | P. R. Willems, B. Lang, and C. Vömel, Computing the bidiagonal SVD using multiple relatively robust representations, SIAM J. Matrix Anal. Appl., 28 (2006), pp. 907–926, . · Zbl 1129.65027 |

[120] | B. B. Zhou and R. P. Brent, A parallel ring ordering algorithm for efficient one-sided Jacobi SVD computations, J. Parallel Distrib. Comput., 42 (1997), pp. 1–10, . · Zbl 0880.68055 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.