×

Multilevel balancing domain decomposition at extreme scales. (English) Zbl 1334.65217

Summary: In this paper we present a fully distributed, communicator-aware, recursive, and interlevel-overlapped message-passing implementation of the multilevel balancing domain decomposition by constraints (MLBDDC) preconditioner. The implementation highly relies on subcommunicators in order to achieve the desired effect of coarse-grain overlapping of computation and communication, and communication and communication among levels in the hierarchy (namely, interlevel overlapping). Essentially, the main communicator is split into as many nonoverlapping subsets of message-passing interface (MPI) tasks (i.e., MPI subcommunicators) as levels in the hierarchy. Provided that specialized resources (cores and memory) are devoted to each level, a careful rescheduling and mapping of all the computations and communications in the algorithm lets a high degree of overlapping be exploited among levels. All subroutines and associated data structures are expressed recursively, and therefore MLBDDC preconditioners with an arbitrary number of levels can be built while re-using significant and recurrent parts of the codes. This approach leads to excellent weak scalability results as soon as level-1 tasks can fully overlap coarser-levels duties. We provide a model to indicate how to choose the number of levels and coarsening ratios between consecutive levels and determine qualitatively the scalability limits for a given choice. We have carried out a comprehensive weak scalability analysis of the proposed implementation for the three-dimensional Laplacian and linear elasticity problems on structured and unstructured meshes. Excellent weak scalability results have been obtained up to 458,752 IBM BG/Q cores and 1.8 million MPI being, being the first time that exact domain decomposition preconditioners (only based on sparse direct solvers) reach these scales.

MSC:

65Y05 Parallel numerical computation
65Y20 Complexity and performance of numerical algorithms
65N55 Multigrid methods; domain decomposition for boundary value problems involving PDEs
65F08 Preconditioners for iterative methods
65N30 Finite element, Rayleigh-Ritz and Galerkin methods for boundary value problems involving PDEs
PDFBibTeX XMLCite
Full Text: DOI

References:

[2] P. R. Amestoy, I. S. Duff, and J.-Y. L’Excellent, {\it Multifrontal parallel distributed symmetric and unsymmetric solvers}, Comput. Methods Appl. Mech. Engrg., 184 (2000), pp. 501-520. · Zbl 0956.65017
[3] S. Badia, A. F. Martín, and J. Principe, {\it Enhanced balancing Neumann-Neumann preconditioning in computational fluid and solid mechanics}, Internat. J. Numer. Methods Engrg., 96 (2013), pp. 203-230. · Zbl 1352.74322
[4] S. Badia, A. F. Martín, and J. Principe, {\it Implementation and scalability analysis of balancing domain decomposition methods}, Arch. Comput. Methods Engrg., 20 (2013), pp. 239-262. · Zbl 1354.65261
[5] S. Badia, A. F. Martín, and J. Principe, {\it A highly scalable parallel implementation of balancing domain decomposition by constraints}, SIAM J. Sci. Comput., (2014), pp. C190-C218. · Zbl 1296.65177
[6] S. Badia, A. F. Martín, and J. Principe, {\it On the scalability of inexact balancing domain decomposition by constraints with overlapped coarse/fine corrections}, Parallel Comput., 50 (2015), pp. 1-24.
[7] A. H. Baker, T. Gamblin, M. Schulz, and U. M. Yang, {\it Challenges of scaling algebraic multigrid across modern multicore architectures}, in Proceedings of the International Parallel Distributed Processing Symposium, IEEE, 2011, pp. 275-286.
[8] S. Balay, J. Brown, K. Buschelman, W. D. Gropp, D. Kaushik, M. G. Knepley, L. Curfman McInnes, B. F. Smith, and H. Zhang, {\it PETSc}, ŭlhttp://www.mcs.anl.gov/petsc (2012).
[9] P. Bastian, G. Wittum, and W. Hackbusch, {\it Additive and multiplicative multi-grid — A comparison}, Computing, 60 (1998), pp. 345-364. · Zbl 0908.65107
[10] J. H. Bramble, J. E. Pasciak, and J. Xu, {\it Parallel multilevel preconditioners}, Math. Comp., 55 (1990), pp. 1-22. · Zbl 0703.65076
[11] S. C. Brenner and R. Scott, {\it The Mathematical Theory of Finite Element Methods}, 3rd ed., Springer, New York, 2010. · Zbl 0804.65101
[12] D. Brömmel and P. Gibbon, {\it High-Q Club The highest scaling Codes on JUQUEEN}, Innov. Supercomput. Deutschland, 11 (2013), pp. 106-107.
[13] E. Chow, R. D. Falgout, J. J. Hu, R. S. Tuminaro, and U. M. Yang, {\it A survey of parallelization techniques for multigrid solvers}, Parallel Process. Sci. Comput., 20 (2006), pp. 179-201.
[14] T. A. Davis, {\it Direct Methods for Sparse Linear Systems}, Vol. 2, SIAM, Philadelphia, 2006. · Zbl 1119.65021
[15] C. R. Dohrmann, {\it A preconditioner for substructuring based on constrained energy minimization}, SIAM J. Sci. Comput., 25 (2003), pp. 246-258. · Zbl 1038.65039
[16] C. R. Dohrmann, {\it An approximate BDDC preconditioner}, Numer. Linear Algebra Appl., 14 (2007), pp. 149-168. · Zbl 1199.65088
[17] C. Farhat, K. Pierson, and M. Lesoinne, {\it The second generation FETI methods and their application to the parallel solution of large-scale linear and geometrically non-linear structural analysis problems}, Comput. Methods Appl. Mech. Engrg., 184 (2000), pp. 333-374. · Zbl 0981.74064
[18] V. Hapla, D. Horak, and M. Merta, {\it Use of direct solvers in TFETI massively parallel implementation}, in Applied Parallel and Scientific Computing, Lecture Notes in Comput. Sci. 7782, Springer, Berlin, 2013, pp. 192-205.
[19] R. A. Haring, M. Ohmacht, T. W. Fox, M. K. Gschwind, D. L. Satterfield, K. Sugavanam, P. W. Coteus, P. Heidelberger, M. A. Blumrich, R. W. Wisniewski, A. Gara, G. L.-T. Chiu, P. A. Boyle, N. H. Chist, and Changhoan Kim, {\it The IBM blue Gene/Q compute chip}, IEEE Micro., 32 (2012), pp. 48-60.
[20] J. Hogg, J. Reid, and J. Scott, {\it Design of a multicore sparse Cholesky factorization using DAGs}, SIAM J. Sci. Comput., 32 (2010), pp. 3627-3649. · Zbl 1221.65088
[21] G. Karypis, {\it A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices. Version {\it 5.1.0}}, Tech. report, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 2013; http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/manual.pdf.
[22] A. Klawonn and O. Rheinbach, {\it Highly scalable parallel domain decomposition methods with an application to biomechanics}, ZAMM Z. Angew. Math. Mech., 90 (2010), pp. 5-32. · Zbl 1355.65169
[23] A. Klawonn, O. Widlund, and M. Dryja, {\it Dual-primal FETI methods for three-dimensional elliptic problems with heterogeneous coefficients}, SIAM J. Numer. Anal., 40 (2002), pp. 159-179. · Zbl 1032.65031
[24] A. Klawonn and O. B. Widlund, {\it A domain decomposition method with lagrange multipliers and inexact solvers for linear elasticity}, SIAM J. Sci. Comput., 22 (2000), pp. 1199-1219. · Zbl 0980.65145
[25] P. T. Lin, J. N. Shadid, M. Sala, R. S. Tuminaro, G. L. Hennigan, and R. J. Hoekstra, {\it Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling}, J. Comput. Phys., 228 (2009), pp. 6250-6267. · Zbl 1175.82077
[26] J. Mandel, {\it Balancing domain decomposition}, Commun. Numer. Methods Engrg., 9 (1993), pp. 233-241. · Zbl 0796.65126
[27] J. Mandel and C. R. Dohrmann, {\it Convergence of a balancing domain decomposition by constraints and energy minimization}, Numer. Linear Algebra Appl., 10 (2003), pp. 639-659. · Zbl 1071.65558
[28] J. Mandel, C. R. Dohrmann, and R. Tezaur, {\it An algebraic theory for primal and dual substructuring methods by constraints}, Appl. Numer. Math., 54 (2005), pp. 167-193. · Zbl 1076.65100
[29] J. Mandel, B. Sousedík, and C. Dohrmann, {\it Multispace and multilevel BDDC}, Computing, 83 (2008), pp. 55-85. · Zbl 1163.65091
[30] Y. Notay and A. Napov, {\it A massively parallel solver for discrete Poisson-like problems}, J. Comput. Phys., 281 (2015), pp. 237-250. · Zbl 1352.65454
[31] O. Rheinbach, {\it Parallel iterative substructuring in structural mechanics}, Arch. Comput. Methods Eng., 16 (2009), pp. 425-463. · Zbl 1179.74157
[32] Y. Saad, {\it Iterative Methods for Sparse Linear Systems}, 2nd ed., SIAM, Philadelphia, 2003. · Zbl 1031.65046
[33] O. Schenk and K. Gärtner, {\it On fast factorization pivoting methods for sparse symmetric indefinite systems}, Electron. Trans. Numer. Anal., 23 (2006), pp. 158-179. · Zbl 1112.65022
[34] B. Sousedík, J. Šístek, and J. Mandel, {\it Adaptive-multilevel BDDC and its parallel implementation}, Computing, 95 (2013), pp. 1087-1119. · Zbl 1307.65175
[35] K. Stüben, {\it A review of algebraic multigrid}, J. Comput. Appl. Math., 128 (2001), pp. 281-309. · Zbl 0979.65111
[36] A. Toselli and O. Widlund, {\it Domain Decomposition Methods–Algorithms and Theory}, Springer-Verlag, Berlin, 2005. · Zbl 1069.65138
[37] X. Tu, {\it Three-level BDDC in three dimensions}, SIAM J. Sci. Comput., 29 (2007), pp. 1759-1780. · Zbl 1163.65094
[38] P. S. Vassilevski and U. M. Yang, {\it Reducing communication in algebraic multigrid using additive variants}, Numer. Linear Algebra Appl., 21 (2014), pp. 275-296. · Zbl 1340.65304
[39] J. Šístek, M. Čertíková, P. Burda, and J. Novotný, {\it Face-based selection of corners in {\it 3}D substructuring}, Math. Comput. Simul., 82 (2012), pp. 1799-1811.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.