×

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers. (English) Zbl 1352.65457

Summary: A direct solver for symmetric sparse matrices from finite element problems is presented. The solver is supposed to work as a local solver of domain decomposition methods for hybrid parallelization on cluster systems of multi-core CPUs, and then it is required to run on shared memory computers and to have an ability of kernel detection. Symmetric pivoting with a given threshold factorizes a matrix with a decomposition introduced by a nested bisection and selects suspicious null pivots from the threshold. The Schur complement constructed from the suspicious null pivots is examined by a factorization with \(1\times 1\) and \(2\times 2\) pivoting and by a robust kernel detection algorithm based on measurement of residuals with orthogonal projections onto supposed image spaces. A static data structure from the nested bisection and a block sub-structure for Schur complements at all bisection levels can use level 3 BLAS routines efficiently. Asynchronous task execution for each block can reduce idle time of processors drastically, and as a result, the solver has high parallel efficiency. Competitive performance of the developed solver to Intel Pardiso on shared memory computers is shown by numerical experiments.

MSC:

65N22 Numerical solution of discretized equations for boundary value problems involving PDEs
65N30 Finite element, Rayleigh-Ritz and Galerkin methods for boundary value problems involving PDEs
65N55 Multigrid methods; domain decomposition for boundary value problems involving PDEs
65Y10 Numerical algorithms for specific classes of architectures
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Kurzak, Implementing linear algebra routines on multi-core processors with pipelining and a look ahead, LAPACK Working Note 178 (2006)
[2] Buttari, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing 35 pp 38– (2009)
[3] Donfack S Grigori L Gropp WD Kale V Hybrid static/dynamic scheduling for already optimized dense matrix factorization Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International Shanghai, China 496 507
[4] Farhat, A method of finite element tearing and interconnecting and its parallel solution algorithm, International Journal for Numerical Methods in Engineering 32 pp 1205– (1991) · Zbl 0758.65075
[5] Farhat, Implicit parallel processing in structural mechanics, Computational Mechanics Advances 2 pp 1– (1994) · Zbl 0805.73062
[6] Mandel, Balancing domain decomposition, Communications in Numerical Methods in Engineering 9 pp 233– (1993) · Zbl 0796.65126
[7] OpenMP Architecture Review Board OpenMP Application Program Interface, ver.3.1 July 2011 http://www.openmp.org/mp-documents/OpenMP3.1.pdf
[8] Chapman, Using OpenMP (2008)
[9] Lewis, Multithreaded Programming with Pthreads (1998)
[10] Demmel, A supernodal approach to sparse partial pivoting, SIAM Journal on Matrix Analysis and Applications 20 pp 720– (1999) · Zbl 0931.65022
[11] Demmel, An asynchronous parallel supernodal algorithm for sparse Gaussian elimination, SIAM Journal on Matrix Analysis and Applications 20 pp 915– (1999) · Zbl 0939.65036
[12] Schenk, Efficient sparse LU factorization with left-right looking strategy on shared memory multiprocessors, BIT 40 pp 158– (1999) · Zbl 0957.65016
[13] Schenk, Solving unsymmetric sparse systems of liner equations with PARDISO, Future Generation Computer Systems 20 pp 475– (2004)
[14] Schenk, Two-level dynamic scheduling in PARDISO: improved scalability on shared memory multiprocessing systems, Parallel Computing 28 pp 187– (2002) · Zbl 0982.68195
[15] Li, SuperLU_DIST : a scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Transactions on Mathematical Software 29 pp 110– (2003) · Zbl 1068.90591
[16] Heath, A Cartesian parallel nested dissection algorithm, SIAM Journal on Matrix Analysis and Applications 16 pp 235– (1995) · Zbl 0815.65055
[17] Heath, Performance of a fully parallel sparse solver, International Journal of Supercomputer Applications and High Performance Computing 11 pp 49– (1997)
[18] Raghavan P User’s guide, DSCPACK: domain-separator codes for the parallel solution of sparse linear systems Technical Report CSE-02-004 2002
[19] Amestoy, Multifrontal parallel distributed symmetric and unsymmetirc solvers, Computer Methods in Applied Mechanics and Engineering 184 pp 501– (2000) · Zbl 0956.65017
[20] Amestoy, A fully asynchronous multifrontal solver using distributed dynamic scheduling, SIAM Journal on Matrix Analysis and Applications 23 pp 15– (2001) · Zbl 0992.65018
[21] Amestoy, Hybrid scheduling for the parallel solution of linear systems, Parallel Computing 32 pp 136– (2006)
[22] Davis, Direct Methods for Sparse Linear Systems (2006) · Zbl 1119.65021
[23] Bunch, Some stable methods for calculating inertia and solving symmetric linear systems, Mathematics of Computation 31 pp 163– (1977) · Zbl 0355.65023
[24] Schenk, On fast factorization pivoting methods for sparse symmetric indefinite systems, Electronic Transactions on Numerical Analysis 23 pp 158– (2006) · Zbl 1112.65022
[25] Guèye, A new parallel sparse direct solver: presentation and numerical experiments in large-scale structural mechanics parallel computing, International Journal for Numerical Methods in Engineering 88 pp 370– (2011) · Zbl 1242.74122
[26] Golub, Matrix Computations, 3. ed. (1996)
[27] Knuth, The Art of Computer Programming: Seminumerical Algorithms (1981) · Zbl 0477.65002
[28] Anderson, LAPACK User’s Guide, 3. ed. (1999) · Zbl 0934.65030
[29] George, Numerical experiments using dissection methods to solve n by n grid problems, SIAM Journal on Numerical Analysis 14 pp 161– (1977) · Zbl 0356.65023
[30] Web site of Intel Math Kernel Library http://software.intel.com/en-us/intel-mkl
[31] Pellegrini, Hybridizing nested dissection and halo approximate minimum degree for efficient sparse matrix ordering, Concurrency: Practice and Experience 12 pp 69– (2000) · Zbl 02181347
[32] Karypis, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM Journal on Scientific Computing 20 pp 359– (1998) · Zbl 0915.68129
[33] George, Algorithms for matrix partitioning and the numerical solution of finite element systems, SIAM Journal on Numerical Analysis 15 pp 297– (1978) · Zbl 0389.65015
[34] Davis, A combined unifrontal/multifrontal method for unsymmetric sparse matrices, ACM Transactions on Mathematical Software 25 pp 1– (1999) · Zbl 0962.65027
[35] Grigori, CALU: a communication optimal LU factorization algorithm, SIAM Journal on Matrix Analysis and Applications 32 pp 1317– (2011) · Zbl 1242.65089
[36] Suzuki, Finite element matrices in congruent subdomains and their effective use for large-scale computations, International Journal for Numerical Methods in Engineering 62 pp 1807– (2005) · Zbl 1121.65365
[37] Web site of the University of Florida Sparse Matrix Collection http://www.cise.ufl.edu/research/sparse/matrices/index.html
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.