×

Reducing the effect of global communication in \(\text{GMRES} (m)\) and CG on parallel distributed memory computers. (English) Zbl 0842.65019

The paper studies the reduction of the communication overhead introduced by inner products when the iterative methods like CG and GMRES are parallelized. Two ways of improvement are suggested. The first way consists in assembling the results of a number of inner products collectively after grouping some orthogonalization steps. The second way uses some rearranging of the computation steps to achieve a possibility of overlapping of communication with computation. The effect of both ways of improvement is assessed by deriving computer time estimations and by running test problems.

MSC:

65F10 Iterative numerical methods for linear systems
65Y05 Parallel numerical computation
65Y20 Complexity and performance of numerical algorithms

Software:

BiCGstab
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bai, Z.; Hu, D.; Reichel, L., A Newton basis GMRES implementation, (Technical Report 91-03 (1991), University of Kentucky) · Zbl 0818.65022
[2] Barnard, S. T.; Simon, H. D., A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems, (Technical Report RNR-92-033 (1992), NASA Ames Research Center: NASA Ames Research Center Moffet Field, CA)
[3] Bisseling, R. H., Parallel iterative solution of sparse linear systems on a transputer network, (Fincham, A. E.; Ford, B., Parallel Computation, Proceedings of the IMA Conference on Parallel Computation. Parallel Computation, Proceedings of the IMA Conference on Parallel Computation, Oxford, UK, September 18-20, 1991 (1993), Oxford University Press: Oxford University Press Oxford), 253-271 · Zbl 1253.65209
[4] Chronopoulos, A. T., Towards efficient parallel implementation of the CG method applied to a class of block tridiagonal linear systems, (Proceedings Supercomputer ’91 (1991), IEEE Computer Society Press: IEEE Computer Society Press Los Alamitos, CA), 578-587
[5] Chronopoulos, A. T.; Gear, C. W., s-Step iterative methods for symmetric linear systems, J. Comput. Appl. Math., 25, 153-168 (1989) · Zbl 0669.65021
[6] Chronopoulos, A. T.; Kim, S. K., s-Step Orthomin and GMRES implemented on parallel computers, (Technical Report 90/43R (1990), UMSI: UMSI Minneapolis, MN) · Zbl 0735.65016
[7] Crone, L. G.C.; van der Vorst, H. A., Communication aspects of the Conjugate Gradient method on distributed memory machines, Supercomputer, X, 6, 4-9 (1993)
[8] D’Azevedo, E. F.; Romine, C. H., Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors, (Technical Report ORNL/TM-12192 (1992), Oak Ridge National Lab) · Zbl 0960.68550
[9] De Keyser, J.; Roose, D., Distributed mapping of SPMD programs with a generalized Kernighan-Lin heuristic, (Gentzsch, W.; Harms, U., High-Performance Computing and Networking. High-Performance Computing and Networking, Lecture Notes in Computer Science, 797 (1994), Springer-Verlag: Springer-Verlag Berlin), 227-232
[10] Demmel, J. W.; Heath, M. T.; van der Vorst, H. A., Parallel numerical linear algebra, Acta Numer., 2 (1993) · Zbl 0793.65011
[11] de Sturler, E., A parallel restructured version of GMRES \((m)\), (Technical Report 91-85 (1991), Delft University of Technology)
[12] de Sturler, E., A parallel variant of GMRES \((m)\), (Vichnevetsky, R.; Miller, J. H.H., Proceedings 13th IMACS World Congress on Computation and Applied Mathematics (1991), IMACS, Criterion Press: IMACS, Criterion Press Dublin), 682-683
[13] E. de Sturler, Incomplete Block LU preconditioners on slightly overlapping subdomains for a massively parallel computer, Appl. Numer. Math. (to appear).; E. de Sturler, Incomplete Block LU preconditioners on slightly overlapping subdomains for a massively parallel computer, Appl. Numer. Math. (to appear). · Zbl 0854.65027
[14] de Sturler, E., Iterative methods on distributed memory computers, (Ph.D. Thesis (1994), Delft University of Technology: Delft University of Technology Delft, Netherlands) · Zbl 0873.65017
[15] Dongarra, J. J.; Duff, I. S.; Sorensen, D. C.; van der Vorst, H. A., Solving Linear Systems on Vector and Shared Memory Computers (1991), SIAM: SIAM Philadelphia, PA
[16] Farhat, C.; Roux, F. X., Implicit parallel processing in structural mechanics, (Technical Report CU-CSSC-93-26 (1993), Center for Aerospace Structures, College of Engineering, University of Colorado: Center for Aerospace Structures, College of Engineering, University of Colorado Boulder, CO) · Zbl 0805.73062
[17] Hestenes, M. R.; Stiefel, E., Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand., 49, 409-436 (1954) · Zbl 0048.09901
[18] Johan, Z., Data parallel finite element techniques for large-scale computational fluid dynamics, (Ph.D. Thesis (1992), Stanford University)
[19] Johan, Z.; Mathur, K. K.; Lennart Johnson, S.; Hughes, T. J.R., Mesh decomposition and communication procedures for finite element applications on the connection machine CM-5 system, (Gentzsch, W.; Harms, U., High-Performance Computing and Networking. High-Performance Computing and Networking, Lecture Notes in Computer Science, 797 (1994), Springer-Verlag: Springer-Verlag Berlin), 233-240
[20] Meurant, G., Numerical experiments for the preconditioned conjugate gradient method on the CRAY X-MP/2, (Technical Report LBL-18023 (1984), University of California: University of California Berkeley, CA)
[21] Nataf, F.; Rogier, F.; de Sturler, E., Optimal interface conditions for domain decomposition methods, (Technical Report CMAP-301 (1994), Centre de Mathématiques Appliqués), CNRS URA-756, Ecole Polytechnique · Zbl 0861.76070
[22] Pothen, A.; Simon, H. D.; Liou, K.-P., Partitioning sparse matrices with eigenvectors of graphs, SIAM J. Matrix Anal. Appl., 11, 430-452 (1990) · Zbl 0711.65034
[23] Saad, Y.; Schultz, M. H., GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 7, 856-869 (1986) · Zbl 0599.65018
[24] Simon, H. D., Partitioning of unstructured problems for parallel processing, (Technical Report RNR-91-008 (1991), NASA Ames Research Center: NASA Ames Research Center Moffet Field, CA)
[25] Sleijpen, G. L.G.; van der Vorst, H. A.; Fokkema, D. R., BiCGstab \((l)\) and other hybrid Bi-CG methods, Numer. Algorithms, 7, 75-109 (1994) · Zbl 0810.65027
[26] Sun, H.; Tang, W. P., Overdetermined Schwarz alternating method, (Technical Report CS-93-53 (1993), Department of Computer Science, University of Waterloo: Department of Computer Science, University of Waterloo Ont) · Zbl 0817.65120
[27] Tan, K. H.; Borsboom, M. J.A., Problem-dependent optimization of flexible couplings in domain decomposition methods, with an application to advection-dominated problems (1993), Mathematical Institute, University of Utrecht: Mathematical Institute, University of Utrecht Netherlands, Preprint 830 · Zbl 0817.65076
[28] Tang, W. P., Generalized Schwarz splittings, SIAM J. Sci. Statist. Comput., 13, 573-595 (1992) · Zbl 0745.65027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.