Algorithm 953: Parallel library software for the multishift QR algorithm with aggressive early deflation. (English) Zbl 1347.65070


65F15 Numerical computation of eigenvalues and eigenvectors of matrices
65Y05 Parallel numerical computation
65Y15 Packaged methods for numerical algorithms


Zbl 1216.65044
Full Text: DOI


[1] E. Agullo, J. W. Demmel, J. J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. 2009. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. J. Phys.: Conf. Ser. 180, 1, 012037.
[2] M. Ahues and F. Tisseur. 1997. A New Deflation Criterion for the QR Algorithm. LAPACK Working Note 122.
[3] T. Auckenthaler, V. Blum, H.-J. Bungartz, T. Huckle, R. Johanni, L. Kraemer, B. Lang, H. Lederer, and P. R. Willems. 2011. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput. 37, 12, 783–794. · Zbl 06094859
[4] Z. Bai, D. Day, J. W. Demmel, and J. J. Dongarra. 1997a. A Test Matrix Collection for Non-Hermitian Eigenvalue Problems (Release 1.0). Technical Report CS-97-355. Department of Computer Science, University of Tennessee. Also available online from http://math.nist.gov/MatrixMarket.
[5] Z. Bai and J. W. Demmel. 1989. On a block implementation of Hessenberg multishift QR iteration. Intl. J. High Speed Comput. 1, 97–112. · Zbl 0726.65035
[6] Z. Bai and J. W. Demmel. 1993. On swapping diagonal blocks in real Schur form. Linear Algebra Appl. 186, 73–95. · Zbl 0783.65030
[7] Z. Bai, J. W. Demmel, J. J. Dongarra, A. Petitet, H. Robison, and K. Stanley. 1997b. The spectral decomposition of nonsymmetric matrices on distributed memory parallel computers. SIAM J. Sci. Comput. 18, 5, 1446–1461. · Zbl 0890.65034
[8] G. Ballard, J. W. Demmel, and I. Dumitriu. 2010. Minimizing Communication for Eigenproblems and the Singular Value Decomposition. Technical Report UCB/EECS-2010-136. EECS Department, University of California, Berkeley. Also as LAPACK Working Note 237.
[9] G. Ballard, J. W. Demmel, O. Holtz, and O. Schwartz. 2011. Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. Appl. 32, 3, 866–901. · Zbl 1246.68128
[10] P. Bientinesi, E. S. Quintana-Ortí, and R. A. van de Geijn. 2005. Representing linear algebra algorithms in code: The FLAME application program interfaces. ACM Trans. Math. Software 31, 1, 27–59. · Zbl 1073.65037
[11] L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. W. Demmel, I. Dhillon, J. J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. 1997. ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA. · Zbl 0886.65022
[12] K. Braman, R. Byers, and R. Mathias. 2002a. The multishift QR algorithm. Part I: Maintaining well-focused shifts and level 3 performance. SIAM J. Matrix Anal. Appl. 23, 4, 929–947. · Zbl 1017.65031
[13] K. Braman, R. Byers, and R. Mathias. 2002b. The multishift QR algorithm. Part II: Aggressive early deflation. SIAM J. Matrix Anal. Appl. 23, 4, 948–973. · Zbl 1017.65032
[14] R. Byers. 2007. LAPACK 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multi-Shift QR Algorithm with Aggressive Early Deflation. LAPACK Working Note 187.
[15] EPFL. 2013. Bellatrix. URL: http://hpc.epfl.ch/clusters/bellatrix/.
[16] M. R. Fahey. 2003. Algorithm 826: A parallel eigenvalue routine for complex Hessenberg matrices. ACM Trans. Math. Software 29, 3, 326–336. · Zbl 1070.65530
[17] J. G. F. Francis. 1962. The QR transformation: A unitary analogue to the LR transformation—Part 2. Comput. J. 4, 4, 332–345.
[18] G. H. Golub and C. F. Van Loan. 1996. Matrix Computations (3rd ed.). Johns Hopkins University Press, Baltimore, MD. · Zbl 0865.65009
[19] A. Grama, A. Gupta, G. Karypis, and V. Kumar. 2003. Introduction to Parallel Computing (2nd ed.). Addison-Wesley, Boston, MA. · Zbl 0861.68040
[20] R. Granat, B. Kågström, and D. Kressner. 2009. Parallel eigenvalue reordering in real Schur forms. Concurrency and Computat.: Pract. Exper. 21, 9, 1225–1250. · Zbl 05645613
[21] R. Granat, B. Kågström, and D. Kressner. 2010. A novel parallel QR algorithm for hybrid distributed memory HPC systems. SIAM J. Sci. Comput. 32, 4, 2345–2378. · Zbl 1216.65044
[22] R. Granat, B. Kågström, D. Kressner, and M. Shao. 2014a. Algorithm 953: Parallel Library Software for the Multishift QR Algorithm with Aggressive Early Deflation—Electronic Appendix: Derivation of the Performance Model.
[23] R. Granat, B. Kågström, D. Kressner, and M. Shao. 2014b. PDHSEQR User’s Guide. Technical Report UMINF-14.24. Department of Computing Science, Umeå University. Available from http://www8.cs.umu.se/research/uminf/.
[24] G. Henry and Robert A. van de Geijn. 1996. Parallelizing the QR algorithm for the unsymmetric algebraic eigenvalue problem: Myths and reality. SIAM J. Sci. Comput. 17, 4, 870–883. · Zbl 0856.65027
[25] G. Henry, D. S. Watkins, and J. J. Dongarra. 2002. A parallel implementation of the nonsymmetric QR algorithm for distributed memory architectures. SIAM J. Sci. Comput. 24, 1, 284–311. · Zbl 1013.65032
[26] M. Hoemmen. 2010. Communication-Avoiding Krylov Subspace Methods. Ph.D. Dissertation. University of California, Berkeley.
[27] HPC2N. 2013a. Abisko. URL: http://www.hpc2n.umu.se/resources/abisko/.
[28] HPC2N. 2013b. Akka. URL: http://www.hpc2n.umu.se/resources/akka/.
[29] D. Irony, S. Toledo, and A. Tiskin. 2004. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distr. Comput. 64, 9, 1017–1026. · Zbl 1114.68081
[30] B. Kågström, D. Kressner, E. S. Quintana-Ortí, and G. Quintana-Ortí. 2008. Blocked algorithms for the reduction to Hessenberg-triangular form revisited. BIT 48, 3, 563–584. · Zbl 1157.65348
[31] B. Kågström, D. Kressner, and M. Shao. 2012. On aggressive early deflation in parallel variants of the QR algorithm. In Applied Parallel and Scientific Computing (PARA 2010) (Lecture Notes in Computer Science), Kristján Jónasson (Ed.), Vol. 7133. Springer-Verlag, Berlin, Germany, 1–10. · Zbl 06071341
[32] L. Karlsson and B. Kågström. 2011. Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures. Parallel Comput. 37, 12, 771–782. · Zbl 1248.65043
[33] R. S. Martin, G. Peters, and J. H. Wilkinson. 1970. Handbook series linear algebra: The QR algorithm for real Hessenberg matrices. Numer. Math. 14, 3, 219–231. · Zbl 0194.46901
[34] T. Schreiber, P. Otto, and F. Hofmann. 1994. A new efficient parallelization strategy for the QR algorithm. Parallel Comput. 20, 1, 63–75. · Zbl 0792.65024
[35] M. Shao. 2011. PDLAQR1: An improved version of the ScaLAPACK routine PDLAHQR. Technical Report UMINF-11.22. Department of Computing Science, Umeå University. Available from http://www8.cs.umu.se/research/uminf/.
[36] B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler. 1976. Matrix Eigensystem Routines—EISPACK Guide (2nd ed.). Lecture Notes in Computer Science, Vol. 6. Springer-Verlag, New York. · Zbl 0325.65016
[37] S. Tomov, R. Nath, and J. J. Dongarra. 2010. Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Comput. 36, 12, 645–654. · Zbl 1214.65020
[38] R. A. van de Geijn. 1988. Storage schemes for parallel eigenvalue algorithms. In Numerical Linear Algebra Digital Signal Processing and Parallel Algorithms, Gene H. Golub and Paul Van Dooren (Eds.). Springer-Verlag, 639–648.
[39] R. A. van de Geijn. 1993. Deferred shifting schemes for parallel QR methods. SIAM J. Matrix Anal. Appl. 14, 180–194. · Zbl 0769.65016
[40] R. A. van de Geijn and D. G. Hudson. 1989. An efficient parallel implementation of the nonsymmetric QR algorithm. In 4th Conference on Hypercube Concurrent Computers and Applications. 697–700.
[41] D. S. Watkins. 1994. Shifting strategies for the parallel QR algorithm. SIAM J. Sci. Comput. 15, 4, 953–958. · Zbl 0808.65028
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.