Grigori, Laura; Li, Xiaoye S. Towards an accurate performance modeling of parallel sparse factorization. (English) Zbl 1122.65029 Appl. Algebra Eng. Commun. Comput. 18, No. 3, 241-261 (2007). Summary: We present a simulation-based performance model to analyze a parallel sparse LU factorization algorithm on modern cache-based, high-end parallel architectures. We consider supernodal right-looking parallel factorization on a bi-dimensional grid of processors, that uses static pivoting. Our model characterizes the algorithmic behavior by taking into account the underlying processor speed, memory system performance, as well as the interconnect speed. The model is validated using the implementation in the SuperLU\(\_\)DIST linear system solver, the sparse matrices from real application, and an IBM POWER3 parallel machine. Our modeling methodology can be adapted to study performance of other types of sparse factorizations such as Cholesky or QR, and on different parallel machines. Cited in 1 Document MSC: 65F05 Direct numerical methods for linear systems and matrix inversion 65F50 Computational methods for sparse matrices 65Y05 Parallel numerical computation 65Y20 Complexity and performance of numerical algorithms 68W10 Parallel algorithms in computer science Keywords:parallel sparse factorization; performance modeling; distributed parallel machine; Cholesky factorization; QR factorization; numerical examples; parallel computation; LU factorization Software:ATLAS; SuperLU-DIST; SparseMatrix; SuperLU; PAPI; GEMM PDFBibTeX XMLCite \textit{L. Grigori} and \textit{X. S. Li}, Appl. Algebra Eng. Commun. Comput. 18, No. 3, 241--261 (2007; Zbl 1122.65029) Full Text: DOI Link References: [1] Agarwal R.C., Gustavson F.G. and Zubair M. (1994). Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Develop. 38(5): 563–576 · doi:10.1147/rd.385.0563 [2] Andersson, S., Bell, R., Hague, J., Holthoff, H., Mayes, P., Nakano, J., Shieh, D., Tuccillo, J.: RS/6000 Scientific and Technical Computing: POWER3 Introduction and Tuning Guide. International Business Machines (1998) http://www.redbooks.ibm.com. [3] Ashcraft C. (1994). The fan-both family of column-based distributed Cholesky factorization algorithms. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 159–191. Springer, Berlin · Zbl 0803.68044 [4] Browne S., Dongarra J., Garner N., Ho G. and Mucci P. (2000). A portable programming interface for performance evaluation on modern processors. Int. J. High Perfor. Comput. Appl. 14(3): 189–204 · doi:10.1177/109434200001400303 [5] Davis, T.: University of Florida Sparse Matrix Collection. NA Digest, vol. 92, no. 42, October 16, 1994, NA Digest, vol. 96, no. 28, July 23, 1996, and NA Digest, vol. 97, no. 23, June 7, 1997 http://www.cise.ufl.edu/research/sparse/matrices [6] Grigori, L., Li, X.S.: Performance analysis of parallel right-looking sparse LU factorization on two-dimensional grid of processors. In: Proceedings of PARA’04 Workshop on State-of-the-art in Scientific Computing, LNCS 3732, pp. 768–777 (2006) [7] Gupta A., Karypis G. and Kumar V. (1997). Highly Scalable Parallel Algorithms for Sparse Matrix Factorization. IEEE Trans. Parallel Distrib. Syst. 8(5): 502–520 · Zbl 0896.65026 · doi:10.1109/71.598277 [8] Kålgström B., Ling P. and Van Loan C. (1998). GEMM-Based Level 3 BLAS: model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24(3): 268–302 · Zbl 0930.65047 · doi:10.1145/292395.292412 [9] Kålgström B., Ling P. and Van Loan C. (1998). GEMM-based level 3 BLAS: portability and optimization issues. ACM Trans. Math. Softw. 24(3): 303–316 · Zbl 0930.65048 · doi:10.1145/292395.292426 [10] Li X.S. and Demmel J.W. (2003). SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Math. Softw. 29(2): 110–140 · Zbl 1068.90591 · doi:10.1145/779359.779361 [11] Schreiber R. (1994). Scalability of sparse direct solvers. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 191–211. Springer, Berlin · Zbl 0796.65025 [12] Skinner, D.: IBM SP Parallel Scaling Overview. http://www.nersc.gov/news/reports/technical/seaborg_scaling [13] Vuduc, R., Kamil, S., Hsu, J., Nishtala, R., Demmel, J.W., Yellick, K.A.: Automatic tuning and analysis of sparse triangular solve. In: ICS 2002: Workshop on Performance via High-Level Languages and Libraries (2002) This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.