A new approach to the lattice Boltzmann method for graphics processing units. (English) Zbl 1225.76237

Summary: Emerging many-core processors, like CUDA capable nVidia GPUs, are promising platforms for regular parallel algorithms such as the Lattice Boltzmann Method (LBM). Since the global memory for graphic devices shows high latency and LBM is data intensive, the memory access pattern is an important issue for achieving good performances. Whenever possible, global memory loads and stores should be coalescent and aligned, but the propagation phase in LBM can lead to frequent misaligned memory accesses. Most previous CUDA implementations of 3D LBM addressed this problem by using low latency on chip shared memory. Instead of this, our CUDA implementation of LBM follows carefully chosen data transfer schemes in global memory. For the 3D lid-driven cavity test case, we obtained up to 86% of the global memory maximal throughput on nVidia’s GT200. We show that as a consequence highly efficient implementations of LBM on GPUs are possible, even for complex models.


76M28 Particle methods and lattice-gas methods
68U10 Computing methodologies for image processing
68W10 Parallel algorithms in computer science


Full Text: DOI Link


[1] J. Dongarra, S. Moore, G. Peterson, S. Tomov, J. Allred, V. Natoli, D. Richie, Exploring new architectures in accelerating CFD for Air Force applications, in: Proceedings of HPCMP Users Group Conference, 2008, pp. 14-17.
[2] S. Tomov, J. Dongarra, M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems. · Zbl 1204.68268
[3] T. Halfhill, Parallel processing with CUDA, Microprocessor Journal.
[4] nVidia, Compute Unified Device Architecture Programming Guide version 2.2, April 2009.
[5] J. Tölke, Implementation of a lattice Boltzmann kernel using the compute unified device architecture developed by nVIDIA, Computing and Visualization in Science, 1-11.
[6] McNamara, G.R.; Zanetti, G., Use of the Boltzmann equation to simulate lattice-gas automata, Physical review letters, 61, 2332-2335, (1988)
[7] Qian, Y.H.; d’Humières, D.; Lallemand, P., Lattice BGK models for navier – stokes equation, Europhysics letters, 17, 6, 479-484, (1992) · Zbl 1116.76419
[8] d’Humiéres, D.; Ginzburg, I.; Krafczyk, M.; Lallemand, P.; Luo, L., Multiple-relaxation-time lattice Boltzmann models in three dimensions, Philosophical transactions: mathematical, physical and engineering sciences, 437-451, (2002) · Zbl 1001.76081
[9] Pohl, T.; Kowarschik, M.; Wilke, J.; Iglberger, K.; Rüde, U., Optimization and profiling of the cache performance of parallel lattice Boltzmann codes, Parallel processing letters, 13, 4, 549-560, (2003)
[10] S. Ryoo, C.I. Rodrigues, S.S. Baghsorkhi, S.S. Stone, D.B. Kirk, W.W. Hwu, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, in: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 73-82.
[11] Henning, J.L., SPEC CPU2006 benchmark descriptions, ACM SIGARCH computer architecture news, 34, 4, 1-17, (2006)
[12] J. Habich, Performance evaluation of numeric compute kernels on nVIDIA GPUs, Master Thesis. · JFM 01.0155.01
[13] Tölke, J.; Krafczyk, M., Teraflop computing on a desktop PC with GPUs for 3D CFD, International journal of computational fluid dynamics, 22, 7, 443-456, (2008) · Zbl 1184.76800
[14] P. Bailey, J. Myre, S.D.C. Walsh, D.J. Lilja, M.O. Saar, Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors, 2008.
[15] Kuznik, F.; Vareilles, J.; Rusaouen, G.; Krauss, G., A double-population lattice Boltzmann method with non-uniform mesh for the simulation of natural convection in a square cavity, International journal of heat and fluid flow, 28, 5, 862-870, (2007)
[16] Lallemand, P.; Luo, L., Theory of the lattice Boltzmann method: acoustic and thermal properties in two and three dimensions, Physical review E, 68, 3, 36706, (2003)
[17] nVidia, CUDA Profiler version 2.2, 2009.
[18] Martin, K.; Hoffman, B., Mastering cmake, A cross-platform build system, (2008), Kitware Inc. Clifton Park NY
[19] Schroeder, W.J.; Martin, K.; Avila, L.S.; Law, C.C., The VTK user’s guide, (2006), Kitware Inc. Clifton Park NY
[20] Q. Zou, X. He, On pressure and velocity flow boundary conditions and bounceback for the lattice Boltzmann BGK model, Arxiv preprint comp-gas/9611001.
[21] Kuznik, F.; Obrecht, C.; Rusaouën, G.; Roux, J.-J., LBM based flow simulation using GPU computing processor, Computers and mathematics with applications, 27, (2009)
[22] Albensoeder, S.; Kuhlmann, H.C., Accurate three-dimensional lid-driven cavity flow, Journal of computational physics, 206, 2, 536-558, (2005) · Zbl 1121.76366
[23] M. Murphy, NVIDIA’s Experience with Open64, nVidia.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.