×

Towards an accurate performance modeling of parallel sparse factorization. (English) Zbl 1122.65029

Summary: We present a simulation-based performance model to analyze a parallel sparse LU factorization algorithm on modern cache-based, high-end parallel architectures. We consider supernodal right-looking parallel factorization on a bi-dimensional grid of processors, that uses static pivoting. Our model characterizes the algorithmic behavior by taking into account the underlying processor speed, memory system performance, as well as the interconnect speed. The model is validated using the implementation in the SuperLU\(\_\)DIST linear system solver, the sparse matrices from real application, and an IBM POWER3 parallel machine. Our modeling methodology can be adapted to study performance of other types of sparse factorizations such as Cholesky or QR, and on different parallel machines.

MSC:

65F05 Direct numerical methods for linear systems and matrix inversion
65F50 Computational methods for sparse matrices
65Y05 Parallel numerical computation
65Y20 Complexity and performance of numerical algorithms
68W10 Parallel algorithms in computer science
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Agarwal R.C., Gustavson F.G. and Zubair M. (1994). Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Develop. 38(5): 563–576 · doi:10.1147/rd.385.0563
[2] Andersson, S., Bell, R., Hague, J., Holthoff, H., Mayes, P., Nakano, J., Shieh, D., Tuccillo, J.: RS/6000 Scientific and Technical Computing: POWER3 Introduction and Tuning Guide. International Business Machines (1998) http://www.redbooks.ibm.com.
[3] Ashcraft C. (1994). The fan-both family of column-based distributed Cholesky factorization algorithms. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 159–191. Springer, Berlin · Zbl 0803.68044
[4] Browne S., Dongarra J., Garner N., Ho G. and Mucci P. (2000). A portable programming interface for performance evaluation on modern processors. Int. J. High Perfor. Comput. Appl. 14(3): 189–204 · doi:10.1177/109434200001400303
[5] Davis, T.: University of Florida Sparse Matrix Collection. NA Digest, vol. 92, no. 42, October 16, 1994, NA Digest, vol. 96, no. 28, July 23, 1996, and NA Digest, vol. 97, no. 23, June 7, 1997 http://www.cise.ufl.edu/research/sparse/matrices
[6] Grigori, L., Li, X.S.: Performance analysis of parallel right-looking sparse LU factorization on two-dimensional grid of processors. In: Proceedings of PARA’04 Workshop on State-of-the-art in Scientific Computing, LNCS 3732, pp. 768–777 (2006)
[7] Gupta A., Karypis G. and Kumar V. (1997). Highly Scalable Parallel Algorithms for Sparse Matrix Factorization. IEEE Trans. Parallel Distrib. Syst. 8(5): 502–520 · Zbl 0896.65026 · doi:10.1109/71.598277
[8] Kålgström B., Ling P. and Van Loan C. (1998). GEMM-Based Level 3 BLAS: model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24(3): 268–302 · Zbl 0930.65047 · doi:10.1145/292395.292412
[9] Kålgström B., Ling P. and Van Loan C. (1998). GEMM-based level 3 BLAS: portability and optimization issues. ACM Trans. Math. Softw. 24(3): 303–316 · Zbl 0930.65048 · doi:10.1145/292395.292426
[10] Li X.S. and Demmel J.W. (2003). SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Math. Softw. 29(2): 110–140 · Zbl 1068.90591 · doi:10.1145/779359.779361
[11] Schreiber R. (1994). Scalability of sparse direct solvers. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 191–211. Springer, Berlin · Zbl 0796.65025
[12] Skinner, D.: IBM SP Parallel Scaling Overview. http://www.nersc.gov/news/reports/technical/seaborg_scaling
[13] Vuduc, R., Kamil, S., Hsu, J., Nishtala, R., Demmel, J.W., Yellick, K.A.: Automatic tuning and analysis of sparse triangular solve. In: ICS 2002: Workshop on Performance via High-Level Languages and Libraries (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.