×

MPI+X: task-based parallelisation and dynamic load balance of finite element assembly. (English) Zbl 07474475

Summary: The main computing phases of numerical methods for solving partial differential equations are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. The matrix assembly consists of a loop over the elements, faces, edges or nodes of the MPI partitions to compute element matrices and vectors and then of their assemblies. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP, with different techniques to avoid the race condition, but presenting efficiency or implementation drawbacks. We propose an alternative, based on task parallelism using some extensions to the OpenMP programming model. In addition, dynamic load balance will be applied, especially efficient in the presence of hybrid meshes. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores.

MSC:

65-XX Numerical analysis
74-XX Mechanics of deformable solids

Software:

Alya; DRAMA; METIS; OmpSs; DLB
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Aubry, R.; Houzeaux, G.; Vázquez, M.; Cela, J. M., Some Useful Strategies for Unstructured Edge-Based Solvers on Shared Memory Machines, International Journal for Numerical Methods in Engineering, 85, 5, 537-561 (2011) · Zbl 1217.76066 · doi:10.1002/nme.2973
[2] Barcelona Supercomputing Center. 2018. Mercurium. https://pm.bsc.es/mcxx.
[3] Barcelona Supercomputing Center. 2018. OmpSs Specification. https://pm.bsc.es/ompss-docs/spec.
[4] Basermann, A.; Clinckemaillie, J.; Coupez, T.; Fingberg, J.; Digonnet, H.; Ducloux, R.; Gratien, J.-M; Hartmann, U.; Lonsdale, G.; Maerten, B.; Roose, D.; Walshaw, C., Dynamic Load-Balancing of Finite Element Applications with the DRAMA Library, Applied Mathematical Modelling, 25, 2, 83-98 (2000) · Zbl 1076.65534 · doi:10.1016/S0307-904X(00)00043-3
[5] Belytschko, T.; Liu, W. K.; Moran, B., Nonlinear Finite Elements for Continua and Structures (2014), Chichester: J. Wiley & Sons
[6] Bull, M.Oct, 2013. “UEABS: The Unified European Application Benchmark Suite.” http://www.prace-ri.eu/IMG/pdf/d7.4_3ip.pdf.
[7] Calmet, H.; Gambaruto, A.; Bates, A.; Vázquez, M.; Houzeaux, G.; Doorly, D., Large-scale CFD Simulations of the Transitional and Turbulent Regime for the Large Human Airways During Rapid Inhalation, Computers in Biology and Medicine, 69, 166-180 (2016) · doi:10.1016/j.compbiomed.2015.12.003
[8] Casoni, E.; Jérusalem, A.; Samaniego, C.; Eguzkitza, B.; Lafortune, P.; Tjahjanto, D.; Sáez, X.; Houzeaux, G.; Vázquez, M., Alya: Computational Solid Mechanics for Supercomputers, Archives of Computational Methods in Engineering, 22, 4, 557-576 (2015) · Zbl 1348.74007 · doi:10.1007/s11831-014-9126-8
[9] Cecka, C.; Lew, A. J.; Darve, E., Assembly of Finite Element Methods on Graphics Processors, International Journal for Numerical Methods in Engineering, 85, 5, 640-669 (2011) · Zbl 1217.80146 · doi:10.1002/nme.2989
[10] Duran, A.; Ayguadé, E.; Badia, R. M.; Labarta, J.; Martinell, L.; Martorell, X.; Planas, J., OmpSs: A Proposal for Programming Heterogeneous Multi-Core Architectures, Parallel Processing Letters, 21, 2, 173-193 (2011)
[11] Farhat, C.; Crivelli, L., A General Approach to Nonlinear FE Computations on Shared-Memory Multiprocessors, Computer Methods in Applied Mechanics and Engineering, 72, 2, 153-171 (1989) · Zbl 0677.68031 · doi:10.1016/0045-7825(89)90157-6
[12] Garcia, M., Corbalan, J., and Labarta, J.. 2009. “LeWI: A Runtime Balancing Algorithm for Nested Parallelism.” Proceedings of the International Conference on Parallel Processing (ICPP09). IEEE Computer Society, Vienna (Austria), 22-25 September. 2019.
[13] Houzeaux, G.; Aubry, R.; Vázquez, M., Extension of Fractional Step Techniques for Incompressible Flows: The Preconditioned Orthomin(1) for the Pressure Schur Complement, Computers & Fluids, 44, 297-313 (2011) · Zbl 1271.76208 · doi:10.1016/j.compfluid.2011.01.017
[14] Houzeaux, G.; de la Cruz, R.; Owen, H.; Vázquez, M., Parallel Uniform Mesh Multiplication Applied to a Navier-Stokes Solver, Computers & Fluids, 80, 142-151 (2013) · Zbl 1284.76250 · doi:10.1016/j.compfluid.2012.04.017
[15] Houzeaux, G.; Garcia-Gasulla, M.; Cajas, J. C.; Artigues, A.; Olivares, E.; Labarta, J.; Vázquez, M., Dynamic Load Balance Applied to Particle Transport in Fluids, International Journal of Computational Fluid Dynamics, 30, 408-418 (2016) · Zbl 1497.76057 · doi:10.1080/10618562.2016.1227070
[16] Houzeaux, G.; Principe, J., A Variational Subgrid Scale Model for Transient Incompressible Flows, International Journal of Computational Fluid Dynamics, 22, 3, 135-152 (2008) · Zbl 1184.76802 · doi:10.1080/10618560701816387
[17] Houzeaux, G.; Vázquez, M.; Aubry, R.; Cela, J. M., A Massively Parallel Fractional Step Solver for Incompressible Flows, Journal of Computational Physics, 228, 17, 6316-6332 (2009) · Zbl 1261.76030 · doi:10.1016/j.jcp.2009.05.019
[18] Karypis, G., and Kumar, V.. 1995. MeTis: Unstructured Graph Partitioning and Sparse Matrix Ordering System, Version 2.0. http://glaros.dtc.umn.edu/gkhome/metis/metis/overview.
[19] Koros˘ec, P.; s˘ilc, J.; Robic˘, B., Solving the Mesh-Partitioning Problem with an Ant-Colony Algorithm, Parallel Computing, 30, 5, 785-801 (2004) · doi:10.1016/j.parco.2003.12.016
[20] Kubale, M., and Dyskretna, Optymalizacja. 2004. Graph Colorings. Contemporary Mathematics (American Mathematical Society) Vol. 352. Providence, US: American Mathematical Society. https://books.google.es/books?id=fokbCAAAQBAJ. · Zbl 1064.05061
[21] Llort, G., Servat, H., González, J., Giménez, J., and Labarta, J.. Nov 2013. “On the Usefulness of Object Tracking Techniques in Performance Analysis.” 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[22] Löhner, R., Renumbering Strategies for Unstructured-Grid Solvers Operating on Shared-Memory, Cache-Based Parallel Machines, Computer Methods in Applied Mechanics and Engineering, 163, 95-109 (1998) · Zbl 0960.76075 · doi:10.1016/S0045-7825(98)00005-X
[23] Löhner, R., Applied Computational Fluid Dynamics Techniques: An Introduction Based on Finite Element Methods (2008), Hoboken, NJ: John Wiley & Sons · Zbl 1151.76002
[24] Löhner, R.; Mut, F.; Cebral, J.; Aubry, R.; Houzeaux, G., Deflated Preconditioned Conjugate Gradient Solvers for the Pressure-Poisson Equation: Extensions and Improvements, International Journal for Numerical Methods in Engineering, 87, 2-14 (2011) · Zbl 1242.76128 · doi:10.1002/nme.2932
[25] Misra, J.; Gries, D., A Constructive Proof of Vizing’s Theorem, Information Processing Letters, 41, 3, 131-133 (1992) · Zbl 0795.68157 · doi:10.1016/0020-0190(92)90041-S
[26] Moore, S.; Ralph, J., User-Defined Events for Hardware Performance Monitoring, Procedia Computer Science, 4, 2096-2104 (2011) · doi:10.1016/j.procs.2011.04.229
[27] Pearce, R., Gokhale, M., and Amato, N. M.. 2013. “Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory.” Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, IPDPS ’13, Washington, DC, USA, 825-836. IEEE Computer Society.
[28] Pillet, V., Labarta, J., Cortes, T., and Girona, S.. 1995. “Paraver: A Tool to Visualize and Analyze Parallel Code.” Proceedings of WoTUG-18: Transputer and Occam Developments, Vol. 44, 17-31. Amsterdam: IOS Press.
[29] Saad, Y., Iterative Methods for Sparse Linear Systems (2003), Philadelphia, PA: SIAM · Zbl 1002.65042
[30] Soto, O.; Löhner, R.; Camelli, F., A Linelet Preconditioner for Incompressible Flow Solvers, International Journal of Numerical Methods for Heat & Fluid Flow, 13, 1, 133-147 (2003) · Zbl 1059.76037 · doi:10.1108/09615530310456796
[31] Straatsma, T. P.; Antypas, K. B.; Williams, T. J., Exascale Scientific Applications: Scalability and Performance Portability (2017), New York, USA: Chapman and Hall/CRC · Zbl 1386.00064
[32] Terpstra, D., Jagode, H., You, H., and Dongarra, J.. 2010. “Collecting Performance Data with PAPI-C.” Tools for High Performance Computing 2009, 3rd Parallel Tools Workshop, Dresden, Germany, 157-173. Springer, Berlin. PAPI: http://icl.cs.utk.edu/papi/index.html - Last accessed December 2018.
[33] Thébault, L., Petit, E., Tchiboukdjian, M., Dinh, Q., and Jalby, W.. Sept. 10-13, 2013. “Divide and Conquer Parallelization of Finite Element Method Assembly.” International Conference on Parallel Computing - ParCo2013, Vol. 25 of Advances in Parallel Computing, Muncih (Germany), 753-762.
[34] Vázquez, M.; Houzeaux, G.; Koric, S.; Artigues, A.; Aguado-Sierra, J.; Arís, R.; Mira, D., Alya: Multiphysics Engineering Simulation Towards Exascale, Journal of Computational Science, 14, 15-27 (2016) · doi:10.1016/j.jocs.2015.12.007
[35] Vidal, R., Casas, M., Moretó, M., Chasapis, D., Ferrer, R., Martorell, X., Ayguadé, E., Labarta, J., and Valero, M.. 2015. “Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads.” In OpenMP: Heterogenous Execution and Data Movements. IWOMP 2015. Lecture Notes in Computer Science, edited by Terboven C., de Supinski B., Reble P., Chapman B., Müller M, Vol 9342. Springer, Cham.
[36] Walshaw, C.; Cross, M., Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm, SIAM Journal on Scientific Computing, 22, 1, 63-80 (2000) · Zbl 0968.05074 · doi:10.1137/S1064827598337373
[37] Wang, M., Ren, X., Li, C., and Li, Z.. 2016. “DMRPar: A Dynamic Mesh Repartitioning Scheme for Dam Break Simulations in OpenFOAM.” 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Dec, 210-215.
[38] Zhou, M.; Sahni, O.; Shephard, M. S.; Carothers, C. D.; Jansen, K. E., Adjacency-based Data Reordering Algorithm for Acceleration of Finite Element Computations, Scientific Programming, 18, 2, 107-123 (2010) · doi:10.1155/2010/273921
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.