zbMATH — the first resource for mathematics

Parallel reduction of four matrices to condensed form for a generalized matrix eigenvalue algorithm. (English) Zbl 1456.65189
Summary: The VZ algorithm proposed by C. F. Van Loan [SIAM J. Numer. Anal. 12, 819–834 (1975; Zbl 0321.65023)] attempts to solve the generalized type of matrix eigenvalue problem \(ACx = \lambda BDx\), where \(A, B \in \mathbb R^{n \times m}\), \(C, D \in \mathbb R^{m \times n}\), and \(m \geq n\), without forming products and inverses. Especially, this algorithm is suitable for solving the generalized singular value problem. Van Loan’s approach first reduces the matrices \(A, B, C\), and \(D\) to a condensed form by the finite step initial reduction. The reduction finds orthogonal matrices \(Q, U, V\), and \(Z\), such that QAZ is upper Hessenberg, and \(QBV, \ Z^TCU\), and \(V^TDU\) are upper triangular. In this initial reduction, \(A\) is reduced to upper Hessenberg form, while simultaneously preserving triangularity of other three matrices. This is done by Givens rotations, annihilating one by one element of \(A\), and by generating three more rotations applied to other matrices per each annihilation. Such an algorithm is quite inefficient. In our work, we propose a blocked algorithm for the initial reduction, based on aggregated Givens rotations and matrix-matrix multiplications, which are applied in the outer loop updates. This algorithm has another level of blocking, exploited in the inner loop. Further, we also consider a variant of the algorithm in a hybrid CPU-GPU framework, where compute-intensive outer loop updates are performed on GPU, and can be overlapped with the reduction in the next step performed on CPU. On the other hand, application of a sequence of rotations in the inner loop is parallelized on CPU, with balanced operation count per thread. Since a large number of aggregated rotations are produced in every outer loop step, they are simultaneously accumulated before outer loop updates. These adjustments speed up original initial reduction considerably which is confirmed by numerical experiments, and the efficiency of the whole VZ algorithm is increased.
65F15 Numerical computation of eigenvalues and eigenvectors of matrices
15A21 Canonical forms, reductions, classification
15A18 Eigenvalues, singular values, and eigenvectors
15A23 Factorization of matrices
65Y05 Parallel numerical computation
65Y20 Complexity and performance of numerical algorithms
BLAS; mctoolbox; CUBLAS
Full Text: DOI
[1] Alter, O.; Brown, PO; Botstein, D., Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms, Proc. Natl. Acad. Sci. USA, 100, 3351-3356 (2003)
[2] Antoulas, AC; Sorensen, DC, Approximation of large-scale dynamical systems: an overview, Int. J. Appl. Math. Comput. Sci., 11, 1093-1121 (2001) · Zbl 1024.93008
[3] Bai, Z.; Demmel, JW, Computing the generalized singular value decomposition, SIAM J. Sci. Comput., 14, 1464-1486 (1993) · Zbl 0789.65024
[4] Bai, Z.; Zha, H., A new preprocessing algorithm for the computation of the generalized singular value decomposition, SIAM J. Sci. Comput., 14, 1007-1012 (1993) · Zbl 0785.65046
[5] Benner, P., Computational methods for linear-quadratic optimization, Supplemento ai Rendiconti del Circolo Matematico di Palermo, Serrie II, 58, 21-56 (1999) · Zbl 0930.65075
[6] Benner, P.; Byers, R.; Mehrmann, V.; Xu, H., Numerical computation of deflating subspaces of skew-Hamiltonian/Hamiltonian pencils, SIAM J. Matrix Anal. Appl., 24, 165-190 (2002) · Zbl 1035.49022
[7] Benner, P., Byers, R., Mehrmann, V., Xu, H.: Robust numerical methods for robust control. Technical Report 06-2004, Institut für Mathematik, TU Berlin (2004) · Zbl 1124.93022
[8] Bhuyan, K.; Singh, SB; Bhuyan, PK, Application of generalized singular value decomposition to ionospheric tomography, Annal. Geophys., 22, 3437-3444 (2004)
[9] Bischof, C.; Van Loan, CF, The WY representation for products of Householder matrices, SIAM J. Sci. Stat. Comput., 8, 2-13 (1987) · Zbl 0628.65033
[10] Bojanczyk, A., Golub, G.H., Van Dooren, P.: The periodic Schur decomposition; algorithm and applications. In: Proceedings of SPIE Conference, vol. 1770, pp. 31-42 (1992)
[11] Bosner, N., Efficient algorithm for simultaneous reduction to the m-Hessenberg-triangular-triangular form, BIT, 55, 677-703 (2015) · Zbl 1326.65036
[12] Bosner, N.; Karlsson, L., Parallel and heterogeneous m-Hessenberg-triangular-triangular reduction, SIAM J. Sci. Comput., 39, C29-C47 (2017) · Zbl 1355.65047
[13] Demmel, JW; Veselić, K., Jacobi’s method is more accurate than QR, SIAM J. Matrix Anal. Appl., 13, 1204-1245 (1992) · Zbl 0759.65011
[14] Falk, S., Langemeyer, P. Schuff, H.K. (ed.): Das Jacobische Rotationsverfahren Fur Reel Symmetrische Matrizenpaare I, II. Friedr. Vieweg & Sohn, Braunschweig (1960) · Zbl 0100.33104
[15] Golub, G.; Reinsch, C., Singular value decomposition and least squares solution, Numer. Math., 14, 403-420 (1970) · Zbl 0181.17602
[16] Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd Edn. The Johns Hopkins University Press, Baltimore and London (1996)
[17] Hari, V.: On Cyclic Jacobi Methods for the Positive Definite Generalized Eigenvalue Problem, publisher=PhD Thesis, FernUniversität-Gesamthochschule, Hagen (1984)
[18] Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia (2002) · Zbl 1011.65010
[19] Higham, NJ; Konstantinov, M.; Mehmann, V.; Petkov, P., The sensitivity of computational control problems, IEEE Control Syst. Mag., 24, 28-43 (2004)
[20] Kågström, B.; Kressner, D.; Quintana-Ortí, ES; Quintana-Ortí, G., Blocked algorithms for the reduction to Hessenberg-triangular form revisited, BIT, 48, 563-584 (2008) · Zbl 1157.65348
[21] Kogbetliantz, E.: Diagonalization of General Complex Matrices as a New Method for Solution of Linear Equations. In: Proc. of Intern. Congr. Math, vol. 2, 356-357. Amsterdam (1954)
[22] Kressner, D., Numerical Methods for General and Structured Eigenvalue Problems Lecture Notes in Computational Science and Engineering, vol. 46 (2005), Heidelberg: Springer, Heidelberg · Zbl 1079.65041
[23] Kuo, SR; Yeih, W.; Wu, YC, Applications of the generalized singular-value decomposition method on the eigenproblem using the incomplete boundary element formulation, J. Sound Vibr., 235, 813-845 (2000) · Zbl 1237.65134
[24] Lang, B., Using level 3 BLAS in rotation-based algorithms, SIAM J. Sci. Comput., 19, 626-634 (1998) · Zbl 0912.65032
[25] Moler, CB; Stewart, GW, An algorithm for generalized matrix eigenvalue problems, SIAM J. Numer. Anal., 10, 241-256 (1973) · Zbl 0253.65019
[26] Moore, BC, Principal component analysis in linear systems: controllability, observabilitiy, and model reduction, IEEE Trans. Automat. Control., 26, 17-32 (1981) · Zbl 0464.93022
[27] Netlib: BLAS (Basic Linear Algebra Subprograms). http://www.netlib.org/blas (2017)
[28] Novaković, V.; Singer, S.; Singer, S., Blocking and parallelization of the Hari-Zimmermann variant of the Falk-Langemeyer algorithm for the generalized SVD, Parallel Comput., 49, 136-152 (2015)
[29] NVIDIA: CUBLAS Library DU-06702-001_v10.0, User Guide. http://docs.nvidia.com/cuda/pdf/CUBLAS_Library.pdf (2018)
[30] Paige, CC, Computing the generalized singular value decomposition, SIAM J. Sci. Stat. Comput., 7, 1126-1146 (1986) · Zbl 0621.65030
[31] Schreiber, R.; Van Loan, CF, A storage-efficient WY representation for products of Householder transformations, SIAM J. Sci. Stat. Comput., 10, 53-57 (1989) · Zbl 0664.65025
[32] Stykel, T., Gramian-based model reduction for descriptor systems, Math. Control Signals Syst., 16, 297-319 (2004) · Zbl 1067.93011
[33] Tombs, MS; Postlethwaite, I., Truncated balanced realization of a stable non-minimal state-space system, Internat. J. Control, 46, 1319-1330 (1987) · Zbl 0642.93015
[34] Van Loan, CF, A general matrix eigenvalue algorithm, SIAM J. Numer. Anal., 12, 819-834 (1975) · Zbl 0321.65023
[35] Watkins, DS, Product eigenvalue problems, SIAM Rev., 47, 3-40 (2005) · Zbl 1075.65055
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.