Optimizing 2D and 3D structured Euler CFD solvers on graphical processing units. (English) Zbl 1365.76106

Summary: This paper presents a methodology for developing finite differences or finite volumes CFD codes on Graphical Processing Units (GPUs) through general purpose guidelines. These guidelines are applied to the implementation on a GPU of a 2D Euler equations solver on a structured grid and its tridimensional extension on multiple GPUs. Several numerical schemes are used. All of them are first-order in time and use a Roe flux differencing scheme in space, which is considered either in its native formulation or using a second-order MUSCL scheme. The 2D problem leads to a discussion about various API, algorithmic and computational optimizations on NVIDA GPUs with 1.3 compute capability, whereas the 3D problem allows to complete the 2D study with the introduction of Fermi GPUs and the definition of a communication system allowing to use efficiently several GPUs on a node.


76Mxx Basic methods in fluid mechanics
65Y10 Numerical algorithms for specific classes of architectures
65Y05 Parallel numerical computation


Full Text: DOI


[1] Corrigan A, Camelli F, Lohner R, Wallin J. Running unstructured grid based CFD solvers on modern graphics hardware; 2009. · Zbl 1394.76084
[2] Kampolis, I.; Trompoukis, X.; Asouti, V.; Giannakoglou, K., CFD-based analysis and two-level aerodynamic optimization on graphics processing units, Comput meth appl mech eng, 199, 9-12, 712-722, (2010) · Zbl 1227.76056
[3] Toro, E.F., Rieman solvers and numerical methods for fluid dynamics, (1999), Springer-Verlag Berlin, ISBN 3-540-65966-8
[4] Mittal, R.; Iaccarino, G., Immersed boundary methods, Annu rev fluid mech, 37, 1, 239-261, (2005) · Zbl 1117.76049
[5] Hagen TR, Lie KA, Natvig JR. Solving the Euler equations on graphics processing units; 2006.
[6] Brandvik T, Pullan G. Acceleration of a 2D Euler solver using graphics hardware. Tech Rep. Cambridge University; 2007.
[7] Brandvik T, Pullan G. Acceleration of a 3D Euler solver using commodity graphics hardware; 2008.
[8] Brandvik T, Pullan G. An accelerated 3D Navier-Stoked solver for flows in turbomachines. In: GT2009 ASME Turbo Expo 2009: Power for Land, Sea and Air; 2009.
[9] Smith M, Kuo FA, Chou CY, Wu JS. Application of a kinetic theory based solver of the Euler equations using GPU. In: The 21st international conference on parallel CFD; 2009.
[10] Kuo, F.A.; Smith, M.R.; Hsieh, C.W.; Chou, C.Y.; Wu, J.S., GPU acceleration for general conservation equations and its application to several engineering problems, Comput fluids, 45, 1, 147-154, (2011) · Zbl 1430.76017
[11] Smith, M.R.; Lin, K.M.; Hung, C.T.; Chen, Y.S.; Wu, J.S., Development of an improved spatial reconstruction technique for the HLL method and its applications, J comput phys, 230, 3, 477-493, (2011) · Zbl 1283.76043
[12] Ran, W.; Cheng, W.; Qin, F.; Luo, X., GPU accelerated CESE method for 1D shock tube problems, J comput phys, 230, 24, 8797-8812, (2011) · Zbl 1370.76131
[13] Abu Talip M, Amano H. A design of one-dimensional Euler equations for fluid dynamics on FPGA. In: 2011 1st International Symposium on Access Spaces (ISAS); 2011. p. 170-3.
[14] Toro, E.F.; Spruce, M.; Speares, W., Restoration of the contact surface in the hll-Riemann solver, Shock waves, 4, 25-34, (1994), <http://dx.doi.org/10.1007/BF01414629> · Zbl 0811.76053
[15] Einfeldt B. On Godunov Type Methods for Gas Dynamics. Bericht. Institut für Geometrie und Praktische Mathematik, Technische Hochschule Aachen; 1986. · Zbl 0642.76088
[16] Chang, S.C., The method of space-time conservation element and solution element new approach for solving the navier – stokes and Euler equations, Journal of computational physics, 119, 2, 295-324, (1995), <http://www.sciencedirect.com/science/article/pii/S0021999185711370> · Zbl 0847.76062
[17] Després, B.; Dubois, F., Systèmes hyperboliques de lois de conservation; chap. 8: schémas de Godunov et de roe, Ecole polytechnique., (2005), ISBN 978-2-7302-1253-3
[18] Harten, A., High resolution schemes for hyperbolic conservation laws, J comput phys, 135, 2, 260-278, (1983) · Zbl 0890.65096
[19] Roe, P.L., Approximate Riemann solvers, parameter vectors, and difference schemes, J comput phys, 43, 2, 357-372, (1981) · Zbl 0474.65066
[20] Van Leer B, Thomas JL, Roe PL, Newsome RW. A comparison of numerical flux formulas for the Euler and Navier-Stokes equations. AIAA Paper 1987.
[21] Ben Khelil, S.; Guillen, P.; Lazareff, M.; Lacau, R.G., Numerical simulation of roll induced moment of cruciform tactical missiles, Aerosp sci technol, 5, 2, 109-124, (2001) · Zbl 1081.76571
[22] Guillen P, Borrel M, Montagné JL. Numerical simulation of unsteady flows using the MUSCL approach. In: ICFD conference on numerical methods for fluid dynamics; 1988.
[23] Van Leer, B., Towards the ultimate conservative difference scheme. ii. monotonicity and conservation combined in a second-order scheme, J comput phys, 14, 4, 361-370, (1974) · Zbl 0276.65055
[24] NVIDIA CUDA programming guide; 2010. <http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf>
[25] NVIDIA CUDA C programming best practices guide; 2010. <http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/CUDA_C_Best_Practices_Guide.pdf>.
[26] CUDA Fortran Programming Guide and Reference. PGI; 2010.
[27] Kirk, D.B.; Hwu, W.m.W., Programming massively parallel processors: a hands-on approach, (2010), Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, ISBN 0123814723
[28] NVIDIA. Whitepaper - NVIDIA’s next generation CUDA compute architecture: Fermi; 2009. <http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf>.
[29] Wong H, Papadopoulou MM, Sadooghi-Alvandi M, Moshovos A. Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS); 2010. p. 235-46.
[30] PGI, PGI Fortran & C Accelerator programming model; 2010. <http://www.pgroup.com/lit/whitepapers/pgi_accel_prog_model_1.2.pdf>.
[31] Xiao S, chun Feng W. Inter-block GPU communication via fast barrier synchronization. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS); 2010. p. 1-12.
[32] Farber, R., Cuda application design and development, (2011), Elsevier Science, ISBN 9780123884329
[33] Sod, G.A., A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws, J comput phys, 27, 1-31, (1978) · Zbl 0387.76063
[34] Woodward, P.; Colella, P., The numerical simulation of two-dimensional fluid flow with strong shocks, J comput phys, 54, 115-173, (1984) · Zbl 0573.76057
[35] Micikevicius P. Analysis-driven optimization. GPU Technology Conference; 2010. <http://www.nvidia.com/content/GTC-2010/pdfs/2012_GTC2010.pdf>.
[36] Liebmann M, Douglas C, Haase G, Horva andth Z. Large scale simulations of the Euler equations on GPU clusters. In: 2010 Ninth International Symposium on Distributed Computing and Applications to Business Engineering and Science (DCABES); 2010. p. 50-4.
[37] Karantasis K, Polychronopoulos E, Ekaterinaris J. Acceleration of a high order accurate method for compressible flows on SDSM based GPU clusters. In: 2010 IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS); 2010. p. 460-67.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.