GPU-accelerated 3-D finite volume particle method. (English) Zbl 1410.65337

Summary: In [“Development of a finite volume particle method for 3-D fluid flow simulations”, Comput. Methods Appl. Mech. Eng. 298, 80–107 (2016; doi:10.1016/j.cma.2015.09.013); “Exact finite volume particle method with spherical-support kernels”, ibid. 317, 102–127 (2017; doi:10.1016/j.cma.2016.12.015)], the second author et al. introduced SPHEROS, a 3-D particle-based solver based on the finite volume particle method (FVPM) featuring a spherical top-hat kernel. In the present research, the authors present algorithms and optimization procedures that allow to significantly accelerate computations by taking advantage of the computational power of graphics processing units (GPUs). The new accelerated solver, GPU-SPHEROS, is developed in CUDA and runs entirely on GPU. All the parallel algorithms and data structures are designed specifically for the GPU many-core architecture. A roofline model is utilized to assess the performance of the kernels and apply appropriate optimization strategies. In particular, the neighbor search algorithm, accounting for almost a third of the overall compute time, features an efficient space-filling curve (SFC) as well as an optimized octree construction procedure. The memory-bound interaction vector computation, accounting for almost two thirds of the overall computation time, features fixed-size memory pre-allocation and an efficient data ordering to reduce memory transactions and costs of dynamic memory operations, i.e., allocation and deallocation. As a case study, the numerical simulation results of water jet deviation by rotating buckets in a Pelton turbine is presented and compared to available experimental data. For that case, a speedup by a factor of almost six times is achieved on a single NVIDIA\(^{\circledR}\) Tesla\(^{\text{TM}}\) P100-SXM2-16 GB GPU with GP100 Pascal architecture compared to a dual CPU node equipped with two Broadwell Intel\(^{\circledR}\) Xeon\(^{\circledR}\) E5-2690 v4 CPUs with 28 total physical cores.


65M08 Finite volume methods for initial value and initial-boundary value problems involving PDEs
65M75 Probabilistic methods, particle methods, etc. for initial value and initial-boundary value problems involving PDEs
65Y10 Numerical algorithms for specific classes of architectures
76M12 Finite volume methods applied to problems in fluid mechanics
76M28 Particle methods and lattice-gas methods
Full Text: DOI


[1] Jahanbakhsh, E.; Vessaz, C.; Maertens, A.; Avellan, F., Development of a finite volume particle method for 3–D fluid flow, Comput Methods Appl Mech Eng, 298, 80-107, (2016) · Zbl 1423.76299
[2] Jahanbakhsh, E.; Maertens, A.; Quinlan, N. J.; Vessaz, C.; Avellan, F., Exact finite volume particle method with spherical-support kernels, Comput Methods Appl Mech Eng, 317, 102-127, (2017)
[3] Gingold, R. A.; Monaghan, J. J., Smoothed particle hydrodynamics-theory and application to non-spherical stars, Mon Not R Astron Soc, 181, 375-389, (1977) · Zbl 0421.76032
[4] LeVeque, R. J., Finite volume methods for hyperbolic problems, 31, (2002), University Press Cambridge
[5] Quinlan, N. J.; Lobovsky, L.; Nestor, R. M., Development of the meshless finite volume particle method with exact and efficient calculation of interparticle area, Comput Phys Commun, 185, 1554-1563, (2014) · Zbl 1348.76103
[6] Jahanbakhsh, E., Simulation of silt erosion using particle-based methods, (2014), École Polytechnique Fédérale de Lausanne (EPFL), Doctoral Thesis N 6284
[7] Vessaz, C., Finite particle flow simulation of free jet deviation by rotating pelton buckets, (2015), École Polytechnique Fédérale de Lausanne (EPFL), Doctoral Thesis N 6470
[8] Vessaz, C.; Jahanbakhsh, E.; Avellan, F., Flow simulation of jet deviation by rotating pelton buckets using finite volume particle method, J Fluids Eng, 137, 7, (2015)
[9] Vessaz, C.; Andolfatto, L.; Avellan, F.; Tournier, C., Toward design optimization of a pelton turbine runner, Struct Multidiscip Optim, 55, 1, 37-51, (2017)
[10] Volkov, Vasily, Understanding latency hiding on gpus, (2016), University of California at Berkeley, PhD Thesis
[11] Cheng, J.; Grossman, M.; McKercher, T., Professional CUDA^{®} c programming, (2014), John Wiley & Sons Inc
[12] lee, D.; Dinov, I.; Dong, B.; Gutman, B.; Yanovsky, I.; Toga, A. W., CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms, Comput Methods Programs in Biomed, 106, 75-187, (2012)
[13] Amorim R, Liebmann M, Haase G, dos Santos RW. Comparing CUDA and OpenGL implementations for a Jacobi iteration, HPCS’09 2009; p.22-32, DOI:10.1109/HPCSIM.2009.5192847.
[14] Potluri, S.; Rossetti, D.; Becker, D.; Poole, D.; Venkata, M. G.; Hernandez, O.; Shamis, P.; Lopez, M. G.; Baker, M.; Poole, W., Exploring openshmem model to program GPU-based extreme-scale systems, OpenSHMEM, 18-35, (2015), LNCS 9397
[15] Hérault, A.; Bilotta, G.; Dalrymple, R. A., SPH on GPU with CUDA, J Hydraul Res, 48, sup1, 74-79, (2010)
[16] Valdez-Balderas, D.; Dominguez, J. M.; Rogers, B. D., Towards accelerating smoothed particle hydrodynamics simulations for free-surface flows on multi-GPU clusters, J Parallel Distrib Comput, 73, 11, 1483-1493, (2013)
[17] Hori, C.; Gotoh, H.; Ikari, H.; Khayyer, A., GPU-acceleration for moving particle semi-implicit method, Comput Fluids, 51, 174-183, (2011) · Zbl 1271.76264
[18] Crespo, A. J.C.; Domínguez, J. M.; Rogers, B. D.; Gómez-Gesteira, M.; Longshaw, S.; Canelas, R.; Vacondio, R.; Barreiro, A.; García-Feal, O., Dualsphysics: open-source parallel CFD solver based on smoothed particle hydrodynamics (SPH), Comput Phys Commun, 187, 204-216, (2015), ISSN 0010-4655 · Zbl 1348.76005
[19] Cercos-Pita, J. L., Aquagpusph, A new free 3D SPH solver accelerated with opencl, Comput Phys Commun, 192, 295-312, (2015) · Zbl 1380.65467
[20] Shadloo, M. S.; Oger, G.; Le Touzé, D., Smoothed particle hydrodynamics method for fluid flows, towards industrial applications: motivations, current state, and challenges, Comput Fluids, 136, 34, (2016), 11 · Zbl 1390.76764
[21] Batchelor, G. K., Introduction to fluid dynamics, (1974), Cambridge University Press, U.K · Zbl 0152.44402
[22] Monaghan, J., Simulating free surface flows with SPH, J Comput Phys, 110, 2, 399-406, (1994), http://dx.doi.org/10.1006/jcph.1994.1034 · Zbl 0794.76073
[23] Junk, M., Do finite volume methods need a mesh?, (Griebel, M.; Schweitzer, M., Lecture notes in computational science and engineering 26, (2003), Springer Berlin Heidelberg), 223-238, http://dx.doi.org/10.1007/978-3-642-56103-0 15 · Zbl 1015.65040
[24] Fatehi, R.; Manzari, M. T., A consistent and fast weakly compressible smoothed particle hydrodynamics with a new wall boundary condition, Int J NumerMethods Fluids, 68, 7, 905-921, (2012) · Zbl 1237.76136
[25] Ofenbeck, G.; Steinmann, R.; Caparros, V.; Spampinato, D. G.; Püschel, M., Applying the roofline model, (IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), (2014)), 76-85
[26] Lo, Y. J.; Williams, S.; Straalen, B. V.; Ligocki, T. J.; Cordery, M. J.; Wright, N. J.; Hall, M. W.; Oliker, L., Roofline model toolkit: a practical tool for architectural and program analysis, (International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, (2014))
[27] Winkler, D.; Meister, M.; Rezavand, M.; Rauch, W., Gpusphase—A shared memory caching implementation for 2D SPH using CUDA, Comput Phys Commun, 213, 165-180, (2017) · Zbl 1376.76054
[28] Malheiros, M. G.; Walter, M., Spatial sorting: an efficient strategy for approximate nearest neighbor searching, Comput Graph, 57, 112-126, (2016)
[29] Bédorf, J.; Gaburov, E.; Zwart, S. P., A sparse octree gravitational N-body code that runs entirely on the GPU processor, J Comput Phys, 231, 7, 2825-2839, (2012) · Zbl 1321.70003
[30] CUDA C Programming Guide, PG-02829-001_v9.1, NVIDIA | March 2018.
[31] https://thrust.github.io/doc/group__gathering.html (last access on June 29th, 2017).
[32] NVIDIA^{®} Tesla^{™} P100, The Most Advanced Datacenter Accelerator Ever Built Featuring Pascal GP100, the World’s Fastest GPU. Whitepaper, NVIDIA^{®} (2016), WP-08019-001_v01.1 | 1.
[33] Perrig, A., Hydrodynamics of the free surface flow in pelton turbine buckets, (2007), École Polytechnique Fédérale de Lausanne (EPFL), Doctoral Thesis N° 3715
[34] Perrig, A.; Avellan, F.; Kueny, J. L.; Farhat, M.; Parkinson, E., Flow in a pelton turbine bucket: numerical and experimental investigations, in transactions American of society of mechanical engineers, J Fluids Eng, 128, 2, 350-358, (2006)
[35] George, AudriusŽidonis; Aggidis, A, State of the art in numerical modelling of pelton turbines, Renew Sustain Energy Rev, 45, 135-144, (2015)
[36] Kvicinsky, S.; Kueny, J. L.; Avellan, F.; Parkinson;, E., Experimental and numerical analysis of free surface flows in a rotating bucket, (Proceedings of 21^{st} IAHR Symposium on Hydraulic Machinery and Systems, Lausanne, Switzerland, (2002)), 359-364
[37] Gupta, V.; Prasad, V.; Khare, R., Numerical simulation of six jet pelton turbine model, Energy, 104, 24-32, (2016)
[38] Jeon, H.; Hoon Park, J.; Shin, Y.; Choi, M., Friction loss and energy recovery of a pelton turbine for different spear positions, Renew Energy, 123, 273-280, (2018)
[39] Marongiu, J. C.; Leboeuf, F.; Caro, J.; Parkinson, E., Free surface flows simulations in pelton turbines using an hybrid SPH-ALE method, J Hydraul Res, 48, 40-49, (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.