High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. (English) Zbl 1194.86019

Summary: We implement a high-order finite-element application, which performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large cluster of NVIDIA Tesla graphics cards using the CUDA programming environment and non-blocking message passing based on MPI. Contrary to many finite-element implementations, ours is implemented successfully in single precision, maximizing the performance of current generation GPUs. We discuss the implementation and optimization of the code and compare it to an existing very optimized implementation in C language and MPI on a classical cluster of CPU nodes. We use mesh coloring to efficiently handle summation operations over degrees of freedom on an unstructured mesh, and non-blocking MPI messages in order to overlap the communications across the network and the data transfer to and from the device via PCIe with calculations on the GPU. We perform a number of numerical tests to validate the single-precision CUDA and MPI implementation and assess its accuracy. We then analyze performance measurements and depending on how the problem is mapped to the reference CPU cluster, we obtain a speedup of 20x or 12x.


86A17 Global dynamics, earthquake problems (MSC2010)
74S05 Finite element methods applied to problems in solid mechanics
Full Text: DOI


[1] Owens, J.D.; Houston, M.; Luebke, D.P.; Green, S.; Stone, J.E.; Phillips, J.C., GPU computing, Proc. IEEE, 96, 5, 879-899, (2008)
[2] Garland, M.; Grand, S.L.; Nickolls, J.; Anderson, J.A.; Hardwick, J.; Morton, S.; Phillips, E.H.; Zhang, Y.; Volkov, V., Parallel computing experiences with CUDA, IEEE micro, 28, 4, 13-27, (2008)
[3] Che, S.; Boyer, M.; Meng, J.; Tarjan, D.; Sheaffer, J.W.; Skadron, K., A performance study of general-purpose applications on graphics processors using CUDA, J. parallel distrib. comput., 68, 10, 1370-1380, (2008)
[4] Kirk, D.B.; Hwu, W.-M.W., Programming massively parallel processors: A hands-on approach, (2010), Morgan Kaufman Boston, Massachusetts, USA
[5] NVIDIA Corporation, NVIDIA’s Next Generation CUDA Compute Architecture: FERMI, Tech. Rep., NVIDIA, Santa Clara, California, USA, 22 p., 2009a, URL <http://www.nvidia.com/object/fermi_architecture.html>.
[6] D. Göddeke, Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters, Ph.D. Thesis, School Technische Universität Dortmund, Fakultät für Mathematik, 2010, <http://hdl.handle.net/2003/27243>.
[7] Owens, J.D.; Luebke, D.P.; Govindaraju, N.K.; Harris, M.J.; Krüger, J.; Lefohn, A.E.; Purcell, T.J., A survey of general-purpose computation on graphics hardware, Comput. graph. forum, 26, 1, 80-113, (2007)
[8] NVIDIA Corporation, NVIDIA CUDA Programming Guide Version 2.3, Santa Clara, California, USA, URL, 139 p., 2009b, <http://www.nvidia.com/cuda>.
[9] Lindholm, E.; Nickolls, J.; Oberman, S.; Montrym, J., NVIDIA tesla: a unified graphics and computing architecture, IEEE micro, 28, 2, 39-55, (2008)
[10] Nickolls, J.; Buck, I.; Garland, M.; Skadron, K., Scalable parallel programming with CUDA, ACM queue, 6, 2, 40-53, (2008)
[11] Khronos OpenCL Working Group, The OpenCL Specification, Version 1.0, 2008, <http://www.khronos.org/opencl>.
[12] Fatahalian, K.; Houston, M., A closer look at gpus, Commun. ACM, 51, 10, 50-57, (2008)
[13] D. Komatitsch, S. Tsuboi, C. Ji, J. Tromp, A 14.6 billion degrees of freedom, 5 teraflops, 2.5 terabyte earthquake simulation on the Earth Simulator, in: Proceedings of the ACM/IEEE Supercomputing SC’2003 Conference, 2003, pp. 4-11, doi:10.1109/SC.2003.10023.
[14] Liu, Q.; Polet, J.; Komatitsch, D.; Tromp, J., Spectral-element moment tensor inversions for earthquakes in southern California, Bull. seismol. soc. am., 94, 5, 1748-1761, (2004)
[15] Chaljub, E.; Komatitsch, D.; Vilotte, J.P.; Capdeville, Y.; Valette, B.; Festa, G., Spectral element analysis in seismology, (), 365-419
[16] Tromp, J.; Komatitsch, D.; Liu, Q., Spectral-element and adjoint methods in seismology, Commun. comput. phys., 3, 1, 1-32, (2008) · Zbl 1183.74320
[17] Cohen, G.; Joly, P.; Tordjman, N., Construction and analysis of higher-order finite elements with mass lumping for the wave equation, (), 152-160 · Zbl 0814.65096
[18] Priolo, E.; Carcione, J.M.; Seriani, G., Numerical simulation of interface waves by high-order spectral modeling techniques, J. acoust. soc. am., 95, 2, 681-693, (1994)
[19] Faccioli, E.; Maggio, F.; Paolucci, R.; Quarteroni, A., 2D and 3D elastic wave propagation by a pseudo-spectral domain decomposition method, J. seismol., 1, 237-251, (1997)
[20] Deville, M.O.; Fischer, P.F.; Mund, E.H., High-order methods for incompressible fluid flow, (2002), Cambridge University Press Cambridge, United Kingdom · Zbl 1007.76001
[21] Chaljub, E.; Capdeville, Y.; Vilotte, J.P., Solving elastodynamics in a fluid – solid heterogeneous sphere: a parallel spectral-element approximation on non-conforming grids, J. comput. phys., 187, 2, 457-491, (2003) · Zbl 1060.86003
[22] De Basabe, J.D.; Sen, M.K., Grid dispersion and stability criteria of some common finite-element methods for acoustic and elastic wave equations, Geophysics, 72, 6, T81-T95, (2007)
[23] Seriani, G.; Oliveira, S.P., Dispersion analysis of spectral-element methods for elastic wave propagation, Wave motion, 45, 729-744, (2008) · Zbl 1231.74185
[24] Vos, P.E.J.; Sherwin, S.J.; Kirby, R.M., From h to p efficiently: implementing finite and spectral/hp element methods to achieve optimal performance for low- and high-order discretisations, J. comput. phys., 229, 5161-5181, (2010) · Zbl 1194.65138
[25] L. Carrington, D. Komatitsch, M. Laurenzano, M. Tikir, D. Michéa, N. Le Goff, A. Snavely, J. Tromp, High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62 thousand processor cores, in: Proceedings of the ACM/IEEE Supercomputing SC’2008 conference, 2008, pp. 1-11 doi:10.1145/1413370.1413432, article #60.
[26] Martin, R.; Komatitsch, D.; Blitz, C.; Le Goff, N., Simulation of seismic wave propagation in an asteroid based upon an unstructured MPI spectral-element method: blocking and non-blocking communication strategies, Lect. notes comput. sci., 5336, 350-363, (2008)
[27] Sherwin, S.J.; Karniadakis, G.E., A triangular spectral element method: applications to the incompressible navier – stokes equations, Comput. methods appl. mech. eng., 123, 189-229, (1995) · Zbl 1075.76621
[28] Taylor, M.A.; Wingate, B.A., A generalized diagonal mass matrix spectral element method for non-quadrilateral elements, Appl. numer. math., 33, 259-265, (2000) · Zbl 0964.65107
[29] Komatitsch, D.; Martin, R.; Tromp, J.; Taylor, M.A.; Wingate, B.A., Wave propagation in 2-D elastic media using a spectral element method with triangles and quadrangles, J. comput. acoust., 9, 2, 703-718, (2001)
[30] Mercerat, E.D.; Vilotte, J.P.; Sánchez-Sesma, F.J., Triangular spectral-element simulation of two-dimensional elastic wave propagation using unstructured triangular grids, Geophys. J. int., 166, 2, 679-698, (2006)
[31] Falk, R.S.; Richter, G.R., Explicit finite element methods for symmetric hyperbolic equations, SIAM J. numer. anal., 36, 3, 935-952, (1999) · Zbl 0923.65065
[32] Hu, F.Q.; Hussaini, M.Y.; Rasetarinera, P., An analysis of the discontinuous Galerkin method for wave propagation problems, J. comput. phys., 151, 2, 921-946, (1999) · Zbl 0933.65113
[33] Rivière, B.; Wheeler, M.F., Discontinuous finite element methods for acoustic and elastic wave problems, Contemp. math., 329, 271-282, (2003) · Zbl 1080.76039
[34] Monk, P.; Richter, G.R., A discontinuous Galerkin method for linear symmetric hyperbolic systems in inhomogeneous media, J. sci. comput., 22-23, 1-3, 443-477, (2005) · Zbl 1082.65099
[35] Grote, M.J.; Schneebeli, A.; Schötzau, D., Discontinuous Galerkin finite element method for the wave equation, SIAM J. numer. anal., 44, 6, 2408-2431, (2006) · Zbl 1129.65065
[36] Bernacki, M.; Lanteri, S.; Piperno, S., Time-domain parallel simulation of heterogeneous wave propagation on unstructured grids using explicit, nondiffusive, discontinuous Galerkin methods, J. comput. acoust., 14, 1, 57-81, (2006) · Zbl 1198.76081
[37] Dumbser, M.; Käser, M.; Toro, E., An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes. part V: local time stepping and p-adaptivity, Geophys. J. int., 171, 2, 695-717, (2007)
[38] Komatitsch, D.; Labarta, J.; Michéa, D., A simulation of seismic wave propagation at high resolution in the inner core of the Earth on 2166 processors of marenostrum, Lect. notes comput. sci., 5336, 364-377, (2008)
[39] V. Volkov, J.W. Demmel, Benchmarking GPUs to tune dense linear algebra, in: SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pp. 1-11, doi:10.1145/1413370.1413402, 2008.
[40] Agullo, E.; Demmel, J.; Dongarra, J.; Hadri, B.; Kurzak, J.; Langou, J.; Ltaief, H.; Luszczek, P.; Tomov, S., Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects, J. phys.: conf. ser., 180, 012037, (2009)
[41] P. Micikevicius, 3D finite-difference computation on GPUs using CUDA, in: GPGPU-2: Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units, Washington, DC, USA, 2009, pp. 79-84, doi:10.1145/1513895.1513905.
[42] N. Bell, M. Garland, Implementing sparse matrix – vector multiplication on throughput-oriented processors, in: SC’09: Proceedings of the 2009 ACM/IEEE Conference on Supercomputing, ACM, New York, USA, 2009, pp. 1-11, doi:10.1145/1654059.1654078.
[43] A. Corrigan, F. Camelli, R. Löhner, J. Wallin, Running unstructured grid based CFD solvers on modern graphics hardware, in: 19th AIAA Computational Fluid Dynamics Conference, 2009, pp. 1-11, aIAA 2009-4001. · Zbl 1394.76084
[44] R. Abdelkhalek, Évaluation des accélérateurs de calcul GPGPU pour la modélisation sismique, Master’s Thesis, School ENSEIRB, Bordeaux, France, 2007.
[45] R. Abdelkhalek, H. Calandra, O. Coulaud, J. Roman, G. Latu, Fast seismic modeling and reverse time migration on a GPU cluster, in: W.W. Smari, J.P. McIntire (Eds.), High Performance Computing and Simulation, 2009, Leipzig, Germany, 2009, pp. 36-44, <http://hal.inria.fr/docs/00/40/39/33/PDF/hpcs.pdf>.
[46] Michéa, D.; Komatitsch, D., Accelerating a 3D finite-difference wave propagation code using GPU graphics cards, Geophys. J. int., 182, 1, 389-402, (2010)
[47] Klöckner, A.; Warburton, T.; Bridge, J.; Hesthaven, J.S., Nodal discontinuous Galerkin methods on graphics processors, J. comput. phys., 228, 7863-7882, (2009) · Zbl 1175.65111
[48] Chaillat, S.; Bonnet, M.; Semblat, J.-F., A multi-level fast multipole BEM for 3-D elastodynamics in the frequency domain, Comput. methods appl. mech. eng., 197, 49-50, 4233-4249, (2008) · Zbl 1194.74109
[49] Gumerov, N.A.; Duraiswami, R., Fast multipole methods on graphics processors, J. comput. phys., 227, 8290-8313, (2008) · Zbl 1147.65012
[50] Raghuvanshi, N.; Narain, R.; Lin, M.C., Efficient and accurate sound propagation using adaptive rectangular decomposition, IEEE trans. visual. comput. graph., 15, 5, 789-801, (2009)
[51] Wu, W.; Heng, P.A., A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting: research articles, Comput. animation virtual worlds arch., 15, 3-4, 219-227, (2004)
[52] Wu, W.; Heng, P.A., An improved scheme of an interactive finite element model for 3D soft-tissue cutting and deformation, Visual comput., 21, 8-10, 707-717, (2005)
[53] K. Liu, X.B. Wang, Y. Zhang, C. Liao, Acceleration of time-domain finite element method (TD-FEM) using graphics processor units (GPU), in: Proceedings of the 7th International Symposium on Antennas, Propagation and EM Theory (ISAPE ’06), Guilin, China, 2006, pp. 1-4, doi:10.1109/ISAPE.2006.353223.
[54] Taylor, Z.A.; Cheng, M.; Ourselin, S., High-speed nonlinear finite element analysis for surgical simulation using graphics processing units, IEEE trans. med. imaging, 27, 5, 650-663, (2008)
[55] Z. Fan, F. Qiu, A.E. Kaufman, S. Yoakum-Stover, GPU Cluster for high performance computing, in: SC ’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, 2004, p. 47, doi:10.1109/SC.2004.26.
[56] Göddeke, D.; Strzodka, R.; Mohd-Yusof, J.; McCormick, P.; Buijssen, S.H.M.; Grajewski, M.; Turek, S., Exploring weak scalability for FEM calculations on a GPU-enhanced cluster, Parallel comput., 33, 10-11, 685-699, (2007)
[57] Göddeke, D.; Wobker, H.; Strzodka, R.; Mohd-Yusof, J.; McCormick, P.S.; Turek, S., Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU, Int. J. comput. sci. eng., 4, 4, 254-269, (2009)
[58] D. Göddeke, S.H. Buijssen, H. Wobker, S. Turek, GPU acceleration of an unmodified parallel finite element Navier-Stokes solver, in: W.W. Smari, J.P. McIntire (Eds.), High Performance Computing and Simulation, 2009, Leipzig, Germany, 2009b, pp. 12-21.
[59] M. Fatica, Accelerating linpack with CUDA on heterogenous clusters, in: D. Kaeli, M. Leeser (Eds.), GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, ACM International Conference Proceeding Series, vol. 383, 2009, pp. 46-51, doi:10.1145/1513895.1513901.
[60] J.C. Phillips, J.E. Stone, K. Schulten, Adapting a message-driven parallel application to GPU-accelerated clusters, in: SC ’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, 2008, pp. 1-9, doi:10.1145/1413370.1413379.
[61] Anderson, J.A.; Lorenz, C.D.; Travesset, A., General purpose molecular dynamics simulations fully implemented on graphics processing units, J. comput. phys., 227, 10, 5342-5359, (2008) · Zbl 1148.81301
[62] J.C. Thibault, I. Senocak, CUDA implementation of a Navier-Stokes solver on multi-GPU desktop platforms for incompressible flows, in: Proceedings of the 47th AIAA Aerospace Sciences Meeting, 1999, pp. 1-15.
[63] E.H. Phillips, Y. Zhang, R.L. Davis, J.D. Owens, Rapid Aerodynamic performance prediction on a cluster of graphics processing units, in: Proceedings of the 47th AIAA Aerospace Sciences Meeting, 2009, pp. 1-11.
[64] J.A. Stuart, J.D. Owens, Message passing on data-parallel architectures, in: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium, 2009, pp. 1-12, doi:10.1109/IPDPS.2009.5161065.
[65] V.V. Kindratenko, J.J. Enos, G. Shi, M.T. Showerman, G.W. Arnold, J.E. Stone, J.C. Phillips, W. Hwu, GPU clusters for high-performance computing, in: Proceedings on the IEEE Cluster’2009 Workshop on Parallel Programming on Accelerator Clusters (PPAC’09), New Orleans, Louisiana, USA, 2009, pp. 1-8.
[66] Z. Fan, F. Qiu, A.E. Kaufman, Zippy: a framework for computation and visualization on a GPU cluster, in: G. Drettakis, R. Scopigno (Eds.), Proceedings of the Eurographics’2008 Symposium on Parallel Graphics and Visualization (EGPGV’08), Hersonissos, Crete, Greece, vol. 27(2), 2008, pp. 341-350.
[67] M. Strengert, C. Müller, C. Dachsbacher, T. Ertl, CUDASA: compute unified device and systems architecture, in: J. Favre, K.L. Ma, D. Weiskopf (Eds.), Proceedings of the Eurographics’2008 Symposium on Parallel Graphics and Visualization (EGPGV’08), Hersonissos, Crete, Greece, 2008, pp. 49-56.
[68] Göddeke, D.; Strzodka, R.; Turek, S., Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations, Int. J. parallel emergent distrib. syst., 22, 4, 221-256, (2007) · Zbl 1188.68084
[69] Komatitsch, D.; Tromp, J., Introduction to the spectral-element method for 3-D seismic wave propagation, Geophys. J. int., 139, 3, 806-822, (1999)
[70] Komatitsch, D.; Tromp, J., Spectral-element simulations of global seismic wave propagation-I. validation, Geophys. J. int., 149, 2, 390-412, (2002)
[71] Komatitsch, D.; Michéa, D.; Erlebacher, G., Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA, J. parallel distrib. comput., 69, 5, 451-460, (2009)
[72] van Wijk, K.; Komatitsch, D.; Scales, J.A.; Tromp, J., Analysis of strong scattering at the micro-scale, J. acoust. soc. am., 115, 3, 1006-1011, (2004)
[73] Seriani, G.; Priolo, E., A spectral element method for acoustic wave simulation in heterogeneous media, Finite elem. anal. des., 16, 337-348, (1994) · Zbl 0810.73079
[74] Canuto, C.; Hussaini, M.Y.; Quarteroni, A.; Zang, T.A., Spectral methods in fluid dynamics, (1988), Springer-Verlag New-York, USA · Zbl 0658.76001
[75] Hughes, T.J.R., The finite element method, linear static and dynamic finite element analysis, (1987), Prentice-Hall International Englewood Cliffs, New Jersey, USA
[76] Nissen-Meyer, T.; Fournier, A.; Dahlen, F.A., A 2-D spectral-element method for computing spherical-Earth seismograms - II. waves in solid – fluid media, Geophys. J. int., 174, 873-888, (2008)
[77] De Basabe, J.D.; Sen, M.K., Stability of the high-order finite elements for acoustic or elastic wave propagation with high-order time stepping, Geophys. J. int., 181, 1, 577-590, (2010)
[78] Danielson, K.T.; Namburu, R.R., Nonlinear dynamic finite element analysis on parallel computers using fortran90 and MPI, Adv. eng. softw., 29, 3-6, 179-186, (1998)
[79] P. Berger, P. Brouaye, J.C. Syre, A mesh coloring method for efficient MIMD processing in finite element problems, in: Proceedings of the International Conference on Parallel Processing, ICPP’82, August 24-27, 1982, IEEE Computer Society, Bellaire, Michigan, USA, 1982, pp. 41-46.
[80] Hughes, T.J.R.; Ferencz, R.M.; Hallquist, J.O., Large-scale vectorized implicit calculations in solid mechanics on a cray X-MP/48 utilizing EBE preconditioned conjugate gradients, Comput. methods appl. mech. eng., 61, 2, 215-248, (1987) · Zbl 0606.73096
[81] Farhat, C.; Crivelli, L., A general approach to nonlinear finite-element computations on shared-memory multiprocessors, Comput. methods appl. mech. eng., 72, 2, 153-171, (1989) · Zbl 0677.68031
[82] Droux, J.-J., An algorithm to optimally color a mesh, Comput. methods appl. mech. eng., 104, 2, 249-260, (1993), 93)90199-8 · Zbl 0775.76146
[83] Dziewoński, A.M.; Anderson, D.L., Preliminary reference Earth model, Phys. Earth planet. in., 25, 297-356, (1981)
[84] Jiao, W.; Wallace, T.C.; Beck, S.L., Evidence for static displacements from the June 9, 1994 deep Bolivian earthquake, Geophys. res. lett., 22, 16, 2285-2288, (1995)
[85] Ekström, G., Calculation of static deformation following the bolivia earthquake by summation of earth’s normal modes, Geophys. res. lett., 22, 16, 2289-2292, (1995)
[86] G. Jost, H. Jin, J. Labarta, J. Giménez, J. Caubet, Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures, in: Proceedings of the IPDPS’2003 International Parallel and Distributed Processing Symposium, Nice, France, 80.2, 2003, doi:10.1109/IPDPS.2003.1213183, URL <www.cepba.upc.es/paraver>.
[87] Pellegrini, F.; Roman, J., SCOTCH: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs, Lect. notes comput. sci., 1067, 493-498, (1996)
[88] Karypis, G.; Kumar, V., A fast and high-quality multilevel scheme for partitioning irregular graphs, SIAM J. sci. comput., 20, 1, 359-392, (1998) · Zbl 0915.68129
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.