Nodal discontinuous Galerkin methods on graphics processors. (English) Zbl 1175.65111

Summary: Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in an element-local way, with weak penalty-based element-to-element coupling.
The resulting locality in memory access is one of the factors that enables DG to run on off-the-shelf, massively parallel graphics processors (GPUs). In addition, DG’s high-order nature lets it require fewer data points per represented wavelength and hence fewer memory accesses, in exchange for higher arithmetic intensity. Both of these factors work significantly in favor of a GPU implementation of DG.
Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for Maxwell’s equations on a general 3D unstructured grid by a factor of around 50 relative to a serial computation on a current-generation CPU. In many cases, our algorithms exhibit full use of the device’s available memory bandwidth. Example computations achieve and surpass 200 gigaflops/s of net application-level floating point work.
In this article, we describe and derive the techniques used to reach this level of performance. In addition, we present comprehensive data on the accuracy and runtime behavior of the method.


65M60 Finite element, Rayleigh-Ritz and Galerkin methods for initial value and initial-boundary value problems involving PDEs
35L65 Hyperbolic conservation laws
65Y20 Complexity and performance of numerical algorithms
65Y05 Parallel numerical computation
35Q61 Maxwell equations


Full Text: DOI arXiv


[1] Timothy Barth, Timothy Knight, A Streaming Language Implementation of the Discontinuous Galerkin Method, Technical Report 20050184165, NASA Ames Research Center, 2005.
[2] Buck, I.; Foley, T.; Horn, D.; Sugerman, J.; Fatahalian, K.; Houston, M.; Hanrahan, P., Brook for GPUs: stream computing on graphics hardware, (), 777-786
[3] M.H. Carpenter, C.A. Kennedy, Fourth-order 2N-storage Runge-Kutta Schemes, Technical report, NASA Langley Research Center, 1994.
[4] Cockburn, B.; Hou, S.; Shu, C.W., The runge – kutta local projection discontinuous Galerkin finite element method for conservation laws. IV: the multidimensional case, Math. comput., 54, 545-581, (1990) · Zbl 0695.65066
[5] International Electrotechnical Commission, Letter Symbols to be used in Electrical Technology - Part 2: Telecommunications and Electronics, Technical report, International Electrotechnical Commission, Geneva, Switzerland, November 2000.
[6] W.J. Dally, P. Hanrahan, M. Erez, T.J. Knight, F. Labonté, J.H. Ahn, N. Jayasena, U.J. Kapasi, A. Das, J. Gummaraju, Merrimac: Supercomputing with streams, in: Proceedings of the ACM/IEEE SC2003 Conference (SC’03), vol. 1, 2003.
[7] D. Göddeke, R. Strzodka, S. Turek, Accelerating double precision FEM simulations with GPUs, in: Proceedings of ASIM, 2005.
[8] Khronos OpenCL Working Group, The OpenCL 1.0 Specification. Khronos Group, December 2008.
[9] Gumerov, Nail A.; Duraiswami, Ramani, Fast multipole methods on graphics processors, J. comput. phys., 227, September, 8290-8313, (2008) · Zbl 1147.65012
[10] Hesthaven, J.S.; Warburton, T., Nodal high-order methods on unstructured grids: I. time-domain solution of maxwell’s equations, J. comput. phys., 181, September, 186-221, (2002) · Zbl 1014.78016
[11] Hesthaven, J.S.; Warburton, T., Nodal discontinuous Galerkin methods: algorithms, analysis, and applications, ISBN: 0387720650, (2007), Springer · Zbl 1078.78014
[12] Hesthaven, J.S.; Gottlieb, S.; Gottlieb, D., Spectral methods for time-dependent problems, ISBN: 0521792118, (2007), Cambridge University Press · Zbl 1111.65093
[13] Jackson, J.D., Classical electrodynamics, ISBN: 047130932X, (1998), Wiley · Zbl 0114.42903
[14] Karypis, G.; Kumar, V., A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. sci. comput., 20, 359-392, (1999) · Zbl 0915.68129
[15] S.E. Krakiwsky, L.E. Turner, M.M. Okoniewski, Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU), in: 2004 IEEE MTT-S International Microwave Symposium Digest, vol. 2, pp. 1033-1036, 2004. ISBN 0149-645X. doi:10.1109/MWSYM.2004.1339160.
[16] Li, W.; Wei, X.; Kaufman, A., Implementing lattice Boltzmann computation on graphics hardware, Visual comput., 19, 444-456, (2003)
[17] Lindholm, E.; Nickolls, J.; Oberman, S.; Montrym, J., Nvidia tesla: a unified graphics and computing architecture, Micro. IEEE, ISSN: 0272-1732, 28, 39-55, (2008)
[18] Nvidia Corporation, NVIDIA CUDA 2.0 Compute Unified Device Architecture Programming Guide, Nvidia Corporation, Santa Clara, USA, June 2008.
[19] W.H. Reed, T.R. Hill, Triangular Mesh Methods for the Neutron Transport Equation, Technical report, Los Alamos Scientific Laboratory, Los Alamos, 1973.
[20] Si, H.; Gaertner, K., Meshing piecewise linear complexes by constrained Delaunay tetrahedralizations, (), 147-163
[21] J. Stratton, S. Stone, W. Hwu, MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores. Technical report, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA, March 2008.
[22] Various authors, Comparison of Nvidia graphics processing units — Wikipedia, The Free Encyclopedia. <http://en.wikipedia.org/w/index.php?title=Comparison_of_Nvidia_graphics_processing_units&oldid=248858931>, 2008 (accessed 9.11.08).
[23] Warburton, T., An explicit construction of interpolation nodes on the simplex, J. eng. math., 56, 247-262, (2006) · Zbl 1110.65014
[24] Warburton, T.; Hagstrom, T., Taming the CFL number for discontinuous Galerkin methods on structured meshes, SIAM J. numer. anal., 46, 3151-3180, (2008) · Zbl 1181.35010
[25] Whaley, R.C.; Petitet, A.; Dongarra, J.J., Automated empirical optimizations of software and the ATLAS project, Parallel comput., 27, 3-35, (2001) · Zbl 0971.68033
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.