Implementing multifrontal sparse solvers for multicore architectures with sequential task flow runtime systems. (English) Zbl 1369.65062


65F50 Computational methods for sparse matrices
65Y05 Parallel numerical computation
65Y10 Numerical algorithms for specific classes of architectures
Full Text: DOI Link


[1] Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, and Florent Lopez. 2013. Multifrontal QR factorization for multicore architectures over runtime systems. In Proceedings of the 19th international Conference on Parallel Processing (Euro-Par 2013). Springer, Berlin, 521–532. http://dx.doi.org/10.1007/978-3-642-40047-6_53 · Zbl 06226417
[2] Emmanuel Agullo, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, and Stanimire Tomov. 2009. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series 180, 1 (2009), 012037. http://stacks.iop.org/1742-6596/180/i=1/a=012037
[3] Emmanuel Agullo, Jack Dongarra, Rajib Nath, and Stanimire Tomov. 2011. Fully empirical autotuned QR factorization for multicore architectures. CoRR abs/1102.5328 (2011). · Zbl 1323.65140
[4] Randy Allen and Ken Kennedy. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann.
[5] Patrick R. Amestoy, Iain S. Duff, and Chiara Puglisi. 1996. Multifrontal QR factorization in a multiprocessor environment. Numerical Linear Algebra with Applications 3(4) (1996), 275–300. · Zbl 0907.65040
[6] The OpenMP architecture review board. 2013. OpenMP 4.0 Complete specifications. (2013).
[7] Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 2001. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems 34, 2 (2001), 115–144. DOI:http://dx.doi.org/10.1007/s00224-001-0004-z · Zbl 0978.68020
[8] Cedric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009 23 (Feb. 2011), 187–198. DOI:http://dx.doi.org/10.1002/cpe.1631 · Zbl 05902866
[9] Rosa M. Badia, José R. Herrero, Jesús Labarta, Josep M. Pérez, Enrique S. Quintana-Ortí, and Gregorio Quintana-Ortí. 2009. Parallelizing dense and banded linear algebra libraries using SMPSs. Concurrency and Computation: Practice and Experience 21, 18 (2009), 2438–2456. · Zbl 05645671
[10] George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Hérault, Pierre Lemarinier, and Jack Dongarra. 2012. DAGuE: A generic distributed DAG engine for high performance computing. Parallel Computing 38, 1–2 (2012), 37–51. DOI:http://dx.doi.org/10.1016/j.parco.2011.10.003 · Zbl 06094445
[11] George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Piotr Luszczek, and Jack Dongarra. 2013. Dense linear algebra on distributed heterogeneous hardware with a symbolic DAG approach. Scalable Computing and Communications: Theory and Practice (2013), 699–733.
[12] Henricus Bouwmeester, Mathias Jacquelin, Julien Langou, and Yves Robert. 2011. Tiled QR factorization algorithms. In Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11). ACM, 7:1–7:11. DOI:http://dx.doi.org/10.1145/2063384.2063393
[13] Zoran Budimlić, Michael Burke, Vincent Cavé, Kathleen Knobe, Geoff Lowney, Ryan Newton, Jens Palsberg, David Peixotto, Vivek Sarkar, Frank Schlimbach, and others. 2010. Concurrent collections. Scientific Programming 18, 3 (2010), 203–217.
[14] Alfredo Buttari. 2013. Fine-grained multithreading for the multifrontal QR factorization of sparse matrices. SIAM Journal on Scientific Computing 35, 4 (2013), C323–C345. http://epubs.siam.org/doi/abs/10.1137/110846427 · Zbl 1362.65031
[15] Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing 35 (Jan. 2009), 38–53. DOI:http://dx.doi.org/10.1016/j.parco.2008.10.002 · Zbl 1197.65240
[16] Michel Cosnard and Emmanuel Jeannot. 2001. Automatic parallelization techniques based on compact DAG extraction and symbolic scheduling. Parallel Processing Letters 11, 01 (2001), 151–168. DOI:http://dx.doi.org/10.1142/S012962640100049X
[17] Timothy A. Davis. 2011. Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization. ACM Transactions on Mathematical Software 38, 1 (Dec. 2011), 8:1–8:22. DOI:http://dx.doi.org/10.1145/2049662.2049670 · Zbl 1365.65122
[18] Timothy A. Davis. 2014. SuiteSparse 4.4.0. (October 2014). Software package.
[19] Timothy A. Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software 38, 1 (Dec. 2011), 1:1–1:25. http://doi.acm.org/10.1145/2049662.2049663 · Zbl 1365.65123
[20] James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou. 2012. Communication-optimal parallel and sequential QR and LU factorizations. SIAM Journal of Scientific Computing 34, 1 (Feb. 2012), 206–239. http://dx.doi.org/10.1137/080731992 · Zbl 1241.65028
[21] Edsger W. Dijkstra. 1965. Een algorithme ter voorkoming van de dodelijke omarming. (1965). http://www.cs.utexas.edu/users/EWD/ewd01xx/EWD108.PDF (circulated privately).
[22] Edsger W. Dijkstra. 1982. The mathematics behind the banker’s algorithm. In Selected Writings on Computing: A Personal Perspective. Springer, New York, 308–312. DOI:http://dx.doi.org/10.1007/978-1- 4612-5695-3_54
[23] Jack Dongarra, Mathieu Faverge, Thomas Hérault, Mathias Jacquelin, Julien Langou, and Yves Robert. 2013. Hierarchical QR factorization algorithms for multi-core clusters. Parallel Computing 39, 4–5 (2013), 212–232. http://hal.inria.fr/hal-00809770.
[24] Iain S. Duff and John K. Reid. 1983. The multifrontal solution of indefinite sparse symmetric linear systems. ACM Transactions On Mathematical Software 9 (1983), 302–325. · Zbl 0515.65022
[25] Lionel Eyraud-Dubois, Loris Marchal, Oliver Sinnen, and Frédéric Vivien. 2015. Parallel scheduling of task trees with limited memory. ACM Transactions on Parallel Computing 2, 2 (June 2015), 13:1–13:37. DOI:http://dx.doi.org/10.1145/2779052
[26] Al Geist and Esmond G. Ng. 1989. Task scheduling for parallel sparse Cholesky factorization. International Journal of Parallel Programming 18 (1989), 291–314. · Zbl 0702.68031
[27] Abdou Guermouche, Jean-Yves L’Excellent, and Gilles Utard. 2003. Impact of reordering on the memory of a multifrontal solver. Parallel Computing 29, 9 (2003), 1191–1218.
[28] Everton Hermann, Bruno Raffin, François Faure, Thierry Gautier, and Jérémie Allard. 2010. Multi-GPU and multi-CPU parallelization for interactive physics simulations. In 16th International Euro-Par Conference on Parallel Processing: Part II. 235–246. · Zbl 05803887
[29] Jonathan Hogg, John K. Reid, and Jennifer A. Scott. 2010. Design of a multicore sparse Cholesky factorization using DAGs. SIAM Journal of Scientific Computing 32, 6 (2010), 3627–3649. · Zbl 1221.65088
[30] Kyungjoo Kim and Victor Eijkhout. 2014. A parallel sparse direct solver via hierarchical DAG scheduling. ACM Transactions on Mathematical Software 41, 1 (Oct. 2014), 3:1–3:27. · Zbl 1369.65046
[31] Jakub Kurzak and Jack Dongarra. 2009. Fully dynamic scheduler for numerical computing on multicore processors. LAPACK Working Note lawn220 (2009).
[32] Xavier Lacoste, Mathieu Faverge, Pierre Ramet, Samuel Thibault, and George Bosilca. 2014. Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes. Research Report RR-8446, INRIA.
[33] Joseph W. H. Liu. 1986. On the storage requirement in the out-of-core multifrontal method for sparse factorization. ACM Transactions on Mathematical Software 12 (1986), 127–148. · Zbl 0605.65015
[34] François Pellegrini and Jean Roman. 1996. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In Proceedings of HPCN’96, Lecture Notes in Computer Science, Vol. 1067. Springer, 493–498.
[35] Gregorio Quintana-Ortí, Enrique S. Quintana-Ortí, Robert A. Van De Geijn, Field G. Van Zee, and Ernie Chan. 2009. Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Transactions on Mathematical Software 36, 3 (2009). · Zbl 1364.65105
[36] Franois-Henry Rouet. 2012. Memory and Performance Issues in Parallel Multifrontal Factorizations and Triangular Solutions with Sparse Right-Hand Sides. Thèse de doctorat. Institut National Polytechnique de Toulouse, Toulouse, France. http://tel.archives-ouvertes.fr/tel-00785748
[37] Robert Schreiber. 1982. A new implementation of sparse Gaussian elimination. ACM Transactions on Mathematical Software 8 (1982), 256–276. · Zbl 0491.65013
[38] Haluk Topcuouglu, Salim Hariri, and Min-You Wu. 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13, 3 (Mar. 2002), 260–274. DOI:http://dx.doi.org/10.1109/71.993206 · Zbl 05106769
[39] Sencer Yeralan, Timothy A. Davis, and Sanjay Ranka. 2015. Sparse QR factorization on the GPU. (Jan. 2015). Submitted to ACM Transactions on Mathematical Software. · Zbl 1484.65075
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.