×

zbMATH — the first resource for mathematics

Restructuring the tridiagonal and bidiagonal QR algorithms for performance. (English) Zbl 1322.65051

MSC:
65F15 Numerical computation of eigenvalues and eigenvectors of matrices
65Y20 Complexity and performance of numerical algorithms
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Anderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J. J., Croz, J. D., Hammarling, S., Greenbaum, A., McKenney, A., and Sorensen, D. 1999.LAPACK Users’ Guide3rd Ed. SIAM, Philadelphia, PA. · Zbl 0934.65030 · doi:10.1137/1.9780898719604
[2] Bientinesi, P., Igual, F. D., Kressner, D., Petschow, M., and Quintana-Ortí, E. S. 2011. Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures.Concurr. Comput. Pract. Exper. 23, 7, 694–707.
[3] Bischof, C. and Van Loan, C. 1987. The WY representation for products of householder matrices.SIAM J. Sci. Stat. Comput. 8, 1, 2–13. · Zbl 0628.65033
[4] Bischof, C., Lang, B., and Sun, X. 1994. Parallel tridiagonalization through two-step band reduction. InProceedings of the Scalable High-Performance Computing Conference. IEEE Computer Society Press, 23–27.
[5] Blast. 2002. Basic linear algebra subprograms technical forum standard.Int. J. High Perf. Appl. Supercomput. 16, 1.
[6] Braman, K., Byers, R., and Mathias, R. 2001a. The multishiftQRalgorithm. Part I: Maintaining well focused shifts and level 3 performance.SIAM J. Matrix Anal. Appl. 23, 929–947. · Zbl 1017.65031 · doi:10.1137/S0895479801384573
[7] Braman, K., Byers, R., and Mathias, R. 2001b. The multishiftQRalgorithm. Part II: Aggressive early deflation.SIAM J. Matrix Anal. Appl. 23, 948–973. · Zbl 1017.65032 · doi:10.1137/S0895479801384585
[8] Cline, A. K. and Dhillon, I. S. 2006. Computation of the singular value decomposition. InHandbook of Linear Algebra, CRC Press, Boca Raton, FL, 45–1–45–13.
[9] Cuppen, J. J. M. 1980. A divide and conquer method for the symmetric tridiagonal eigenproblem.Numerische Mathematik 36, 177–195. · Zbl 0431.65022 · doi:10.1007/BF01396757
[10] Demmel, J. and Kahan, W. 1990. Accurate singular values of bidiagonal matrices.SIAM J. Sci. Statist. Comput. 11, 873–912. · Zbl 0705.65027 · doi:10.1137/0911052
[11] Demmel, J. and Veselić, K. 1992. Jacobi’s method is more accurate than QR.SIAM J. Matrix Anal. Appl. 13, 4, 1204–1245. · Zbl 0759.65011
[12] Demmel, J., Dhillon, I., and Ren, H. 1995. On the correctness of some bisection-like parallel eigenvalue algorithms in floating point arithmetic.Electron. Trans. Num. Anal. 3, 116–149. · Zbl 0860.65026
[13] Demmel, J., Marques, O., Parlett, B., and Vömel, C. 2008. Performance and accuracy of LAPACK’s symmetric tridiagonal eigensolvers.SIAM J. Sci. Comput. 30, 3, 1508–1526. · Zbl 1165.65014
[14] Dhillon, I. S. 1997. A newO(n<sup>2</sup>) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. Ph.D. thesis, EECS Department, University of California, Berkeley.
[15] Dhillon, I. S. 1998. Current inverse iteration software can fail.BIT Numer. Math. 38, 685–704. · Zbl 0918.65025 · doi:10.1007/BF02510409
[16] Dhillon, I. S. and Parlett, B. N. 2003. Orthogonal eigenvectors and relative gaps.SIAM J. Matrix Anal. Appl. 25, 3, 858–899. · Zbl 1068.65046 · doi:10.1137/S0895479800370111
[17] Dhillon, I. S. and Parlett, B. N. 2004. Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices.Linear Algor. Appl. 387, 1–28. · Zbl 1055.65048 · doi:10.1016/j.laa.2003.12.028
[18] Dhillon, I. S., Parlett, B. N., and Vomel, C. 2006. The design and implementation of the MRRR algorithm.ACM Trans. Math. Softw. 32, 533–560. · Zbl 1230.65046 · doi:10.1145/1186785.1186788
[19] Dongarra, J. J., Du Croz, J., Hammarling, S., and Hanson, R. J. 1988. An extended set of FORTRAN basic linear algebra subprograms.ACM Trans. Math. Softw. 14, 1, 1–17. · Zbl 0639.65016 · doi:10.1145/42288.42291
[20] Drmač, Z. 2009. A global convergence proof for cyclic Jacobi methods with block rotations.SIAM J. Matrix Anal. Appl. 31, 3, 1329–1350. · Zbl 1201.65052
[21] Drmač, Z. and Veselić, K. 2008a. New fast and accurate Jacobi SVD algorithm. I.SIAM J. Matrix Anal. Appl. 29, 4, 1322–1342. · Zbl 1221.65100
[22] Drmač, Z. and Veselić, K. 2008b. New fast and accurate Jacobi SVD algorithm. II.SIAM J. Matrix Anal. Appl. 29, 4, 1343–1362. · Zbl 1221.65101
[23] Fernando, K. V. and Parlett, B. N. 1994. Accurate singular values and differential qd algorithms.Numer. Math. 67, 191–229. · Zbl 0814.65036 · doi:10.1007/s002110050024
[24] Golub, G. H. and Loan, C. F. V. 1996.Matrix Computations3rd Ed. The Johns Hopkins University Press, Baltimore, MD. · Zbl 0865.65009
[25] Goto, K. and van de Geijn, R. A. 2008. Anatomy of a high-performance matrix multiplication.ACM Trans. Math. Soft. 34, 3. · Zbl 1190.65064 · doi:10.1145/1356052.1356053
[26] Gu, M. and Eisenstat, S. C. 1995a. A divide-and-conquer algorithm for the bidiagonal SVD.SIAM J. Matrix Anal. Appl. 16, 1, 79–92. · Zbl 0821.65019 · doi:10.1137/S0895479892242232
[27] Gu, M. and Eisenstat, S. C. 1995b. A divide-and-conquer algorithm for the symmetric tridiagonal eigenproblem.SIAM J. Matrix Anal. Appl. 16, 1, 172–191. · Zbl 0815.65050 · doi:10.1137/S0895479892241287
[28] Haidar, A., Ltaief, H., and Dongarra, J. 2011. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. LAPACK Working Note 254. http://www.netlib.org/lapack/lawnspdf/lawn254.pdf.
[29] Howell, G. W., Demmel, J. W., Fulton, C. T., Hammarling, S., and Marmol, K. 2008. Cache efficient bidiagonalization using BLAS 2.5 operators.ACM Trans. Math. Softw. 34, 3, 14:1–14:33. · Zbl 1190.65056
[30] Jacobi, C. 1846. Über ein leichtes verfahren, die in der theorie der säkularstörungen vorkommenden gleichungen numerisch aufzulösen.Crelle’s J. 30, 51–94.
[31] Jiang, E. and Zhang, Z. 1985. A new shift of the QL algorithm for irreducible symmetric tridiagonal matrices.Linear Algebra Appl. 65, 0, 261–272. · Zbl 0572.65023
[32] Joffrain, T., Low, T. M., Quintana-Ortí, E. S., van de Geijn, R., and Van Zee, F. 2006. Accumulating householder transformations.ACM Trans. Math. Softw. 32, 2, 169–179. · Zbl 1365.65106
[33] Kågström, B., Kressner, D., Quintana-Ortí, E. S., and Quintana-Ortí, G. 2008. Blocked algorithms for the reduction to Hessenberg-triangular form revisited.BIT Numer. Math. 48, 3, 563–584. · Zbl 1157.65348
[34] Lang, B. 1998. Using level 3 BLAS in rotation-based algorithms.SIAM J. Sci. Comput. 19, 2, 626–634. · Zbl 0912.65032 · doi:10.1137/S1064827595280211
[35] LAPACK. 2011. http://www.netlib.org/lapack/.
[36] Lawson, C. L., Hanson, R. J., Kincaid, D. R., and Krogh, F. T. 1979. Basic linear algebra subprograms for Fortran usage.ACM Trans. Math. Soft. 5, 3, 308–323. · Zbl 0412.65022 · doi:10.1145/355841.355847
[37] libflame. 2011. http://www.cs.utexas.edu/users/flame/libflame/.
[38] Luszczek, P., Ltaief, H., and Dongarra, J. 2011. Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. LAPACK Working Note 244. http://www.netlib.org/lapack/lawnspdf/lawn244.pdf.
[39] Nakatsukasa, Y., Aishima, K., and Yamazaki, I. 2012. dqds with aggressive early deflation.SIAM J. Matrix Anal. Appl. 33, 1, 22–51. · Zbl 1248.65042 · doi:10.1137/110821330
[40] OpenBLAS. 2011. https://github.com/xianyi/OpenBLAS/.
[41] Parlett, B. N. 1980.The Symmetric Eigenvalue Problem. Prentice-Hall. · Zbl 0431.65017
[42] Parlett, B. N. and Marques, O. A. 1999. An implementation of the dqds algorithm (positive case).Linear Algebra Appl. 309, 217–259. · Zbl 0952.65031 · doi:10.1016/S0024-3795(00)00010-0
[43] Quintana-Ortí, G. and Vidal, A. M. 1992. Parallel algorithms for QR factorization on shared memory multiprocessors. InProceedings of the European Workshop on Parallel Computing’92, Parallel Computing: From Theory to Sound Practice. IOS Press, 72–75.
[44] Rajamanickam, S. 2009. Efficient algorithms for sparse singular value decomposition. Ph.D. thesis, University of Florida.
[45] Rutter, J. D. 1994. A serial implementation of Cuppen’s divide and conquer algorithm for the symmetric eigenvalue problem. Tech. rep. CSD-94-799, Computer Science Division, University of California at Berkeley.
[46] Schreiber, R. and van Loan, C. 1989. A storage-efficient WY representation for products of Householder transformations.SIAM J. Sci. Statist. Comput. 10, 1, 53–57. · Zbl 0664.65025 · doi:10.1137/0910005
[47] Smith, B. T., Boyle, J. M., Dongarra, J. J., Garbow, B. S., Ikebe, Y., et al. 1976.Matrix Eigensystem Routines – EISPACK Guide2nd Ed. Lecture Notes in Computer Science, vol. 6, Springer. · Zbl 0325.65016 · doi:10.1007/3-540-07546-1
[48] Stewart, G. W. 2001.Matrix Algorithms Volume II: Eigensystems. SIAM, Philadelphia, PA. · Zbl 0984.65031
[49] Van Zee, F. G. 2011.Libflame: The Complete Reference. http://www.lulu.com/.
[50] Van Zee, F. G., van de Geijn, R., and Quintana-Ortí, G. 2011. Restructuring the QR algorithm for high-performance application of Givens rotations. Tech. rep. TR-11-36, FLAME Working Note 60. Department of Computer Science, The University of Texas at Austin.
[51] Van Zee, F. G., van de Geijn, R. A., Quintana-Ortí, G., and Elizondo, G. J. 2012. Families of algorithms for reducing a matrix to condensed form.ACM Trans. Math. Soft. 39, 1. · Zbl 1295.65052
[52] Watkins, D. S. 1982. Understanding theQRalgorithm.SIAM Rev. 24, 4, 427–440. · Zbl 0525.65018 · doi:10.1137/1024100
[53] Wilkinson, J. H. 1968. Global convergence of tridiagonalQRalgorithm with origin shifts.Linear Algebra Appl. 1, 409–420. · Zbl 0237.65029 · doi:10.1016/0024-3795(68)90017-7
[54] Willems, P. R. and Lang, B. 2011. A framework for the MR<sup>3</sup> algorithm: Theory and implementation. Tech. rep., Bergische Universität Wuppertal. http://www-ai.math.uni-wuppertal.de/SciComp/preprints/SC1102.pdf.
[55] Willems, P. R., Lang, B., and Vömel, C. 2006. Computing the bidiagonal SVD using multiple relatively robust representations.SIAM J. Matrix Anal. Appl. 28, 4, 907–926. · Zbl 1129.65027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.