Experiments with sparse Cholesky using a sequential task-flow implementation. (English) Zbl 1405.65036

Summary: We describe the development of a prototype code for the solution of large sparse symmetric positive definite systems that is efficient on parallel architectures. We implement a DAG-based Cholesky factorization that offers good performance and scalability on multicore architectures. Our approach uses a runtime system to execute the DAG. The runtime system plays the role of a software layer between the application and the architecture and handles the management of task dependencies as well as the task scheduling. In this model, the application is expressed using a high-level API, independent of the hardware details, thus enabling portability across different architectures. Although widely used in dense linear algebra, this approach is nevertheless challenging for sparse algorithms because of the irregularity and variable granularity of the DAGs arising in these systems. We assess the ability of two different Sequential Task Flow implementations to address this challenge: one implemented with the OpenMP standard, and the other with the modern runtime system StarPU. We compare these implementations to the state-of-the-art solver HSL_MA87 and demonstrate comparable performance on a multicore architecture.


65F05 Direct numerical methods for linear systems and matrix inversion
65F50 Computational methods for sparse matrices
65Y05 Parallel numerical computation
Full Text: DOI


[1] E. Agullo, A. Buttari, A. Guermouche and F. Lopez, Implementing multifrontal sparse solvers for multicore architectures with sequential task flow runtime systems, ACM Trans. Math. Softw., 43 (2016), 13: 1-13: 22, · Zbl 1369.65062
[2] E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek and S. Tomov, Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, 180 (2009), 012037, http://stacks.iop.org/1742-6596/180/i=1/a=012037.
[3] P. R. Amestoy, T. A. Davis and I. S. Duff, An approximate minimum degree ordering algorithm, SIAM J. Matrix Anal. Appl., 17 (1996), 886-905, · Zbl 0861.65021
[4] P. R. Amestoy, T. A. Davis and I. S. Duff, Algorithm 837: Amd, an approximate minimum degree ordering algorithm, ACM Trans. Math. Softw., 30 (2004), 381-388, · Zbl 1070.65534
[5] C. Augonnet, S. Thibault, R. Namyst and P. -A. Wacrenier, Starpu: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, 23 (2011), 187-198,
[6] G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Hérault, J. Kurzak, J. Langou, P. Lemarinier, H. Ltaief, P. Luszczek, A. Yarkhan and J. J. Dongarra, Distibuted dense numerical linear algebra algorithms on massively parallel architectures: DPLASMA, in Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW’11), PDSEC 2011, Anchorage, United States, (2011), 1432-1441, https://hal.inria.fr/hal-00809680.
[7] G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault and J. J. Dongarra, Parsec: Exploiting heterogeneity to enhance scalability, Computing in Science and Engineering, 15 (2013), 36-45,
[8] A. Buttari, Fine-grained multithreading for the multifrontal QR factorization of sparse matrices, SIAM Journal on Scientific Computing, 35 (2013), C323-C345, · Zbl 1362.65031
[9] M. Cosnard and M. Loi, Automatic task graph generation techniques, in System Sciences, 1995. Proceedings of the Twenty-Eighth Hawaii International Conference on, 2 (1995), 113-122.
[10] T. A. Davis; Y. Hu, The university of florida sparse matrix collection, ACM Trans. Math. Softw., 38, 1:1, (2011) · Zbl 1365.65123
[11] G. A. Geist; E. Ng, Task scheduling for parallel sparse Cholesky factorization, Int. J. Parallel Program., 18, 291, (1990) · Zbl 0702.68031
[12] A. George; J. W. H. Liu, An automatic nested dissection algorithm for irregular finite element problems, SINUM, 15, 1053, (1978) · Zbl 0408.65064
[13] P. Hénon; P. Ramet; J. Roman, Pastix: A high-performance parallel direct solver for sparse symmetric definite systems, Parallel Computing, 28, 301, (2002) · Zbl 0984.68208
[14] J. D. Hogg; J. K. Reid; J. A. Scott, Design of a multicore sparse Cholesky factorization using dags, SIAM Journal on Scientific Computing, 32, 3627, (2010) · Zbl 1221.65088
[15] J. D. Hogg and J. A. Scott, A modern analyse phase for sparse tree-based direct methods, Technical Report RAL-TR-2010-031, STFC Rutherford Appleton Lab., 2010, https://epubs.stfc.ac.uk/work/54246.
[16] F. D. Igual; E. Chan; E. S. Quintana-Ortí; G. Quintana-Ortí; R. A. van de Geijn; F. G. V. Zee, The flame approach: from dense linear algebra algorithms to high-performance multi-accelerator implementations, J. Parallel Distrib. Comput., 72, 1134, (2012)
[17] G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20 (1998), 359-392, · Zbl 0915.68129
[18] J. W. H. Liu, Modification of the minimum-degree algorithm by multiple elimination, ACM Trans. Math. Softw., 11 (1985), 141-153, · Zbl 0568.65015
[19] F. Lopez, Task-based Multifrontal QR Solver for Heterogeneous Architectures, Thèse de doctorat, Université Paul Sabatier, Toulouse, France, 2015.
[20] W. F. Tinney; J. W. Walker, Direct solutions of sparse network equations by optimally ordered triangular factorization, Proceedings of the IEEE, 55, 1801, (1967)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.