GPU-accelerated boundary element method for Helmholtz’ equation in three dimensions. (English) Zbl 1183.76829

Summary: Recently, the application of graphics processing units (GPUs) to scientific computations is attracting a great deal of attention, because GPUs are getting faster and more programmable. In particular, NVIDIA’s GPUs called compute unified device architecture enable highly mutlithreaded parallel computing for non-graphic applications. This paper proposes a novel way to accelerate the boundary element method (BEM) for three-dimensional Helmholtz’ equation using CUDA. Adopting the techniques for the data caching and the double-single precision floating-point arithmetic, we implemented a GPU-accelerated BEM program for GeForce 8-series GPUs. The program performed 6-23 times faster than a normal BEM program, which was optimized for an Intel’s quad-core CPU, for a series of boundary value problems with 8000-128000 unknowns, and it sustained a performance of 167 Gflop/s for the largest problem (1 058 000 unknowns). The accuracy of our BEM program was almost the same as that of the regular BEM program using the double precision floating-point arithmetic. In addition, our BEM was applicable to solve realistic problems. In conclusion, the present GPU-accelerated BEM works rapidly and precisely for solving large-scale boundary value problems for Helmholtz’ equation.


76M15 Boundary element methods applied to problems in fluid mechanics
76Q05 Hydro- and aero-acoustics


Full Text: DOI


[1] Bonnet, Boundary Integral Equation Methods for Solids and Fluids (1995)
[2] Nishimura, Fast multipole accelerated boundary integral equation methods, Applied Mechanics Review 55 pp 299– (2002)
[3] Owens JD, Luebke D, Govindaraju N, Harris M, Krger J, Lefohn AE, Purcell TJ. A survey of general-purpose computation on graphics hardware. Eurographics 2005, State of the Art Reports, Dublin, Ireland, 2005; 21???51.
[4] CUDA Programming Guide Version 1.1. http://www.nvidia.com/object/cuda_develop.html [22 November 2008].
[5] G??ddeke D. GPGPU???Basic Math Tutorial. http://www.mathematik.uni-dortmund.de/???goddeke/gpgpu/tutorial.html [22 November 2008].
[6] The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics. http://developer.nvidia.com/object/cg_tutorial_home.html [22 November 2008].
[7] Takahashi T. GPGPU for BEM. Proceedings of IABEM, 2006, Graz, Austria, 2006; 101???104.
[8] Takahashi, Accelerating boundary integral equation method using a special-purpose computer, International Journal for Numerical Methods in Engineering 66 pp 529– (2006) · Zbl 1114.65149
[9] Takahashi, An acceleration of the boundary integral equation method for three-dimensional elastostatics by a special-purpose computer, Transactions of the Japan Society of Mechanical Engineers A71???712 pp 1620– (2005)
[10] Takahashi, A hardware acceleration of the method of moments for the electromagnetic scattering problems in three dimensions, IPSJ Transaction on Advanced Computing System ACS14 pp 172– (2006)
[11] Takahashi, A hardware acceleration of the time domain boundary integral equation method for the wave equation in two dimensions, Engineering Analysis with Boundary Elements 31 pp 95– (2007)
[12] Susukita, Hardware accelerator for molecular dynamics: MDGRAPE-2, Computer Physics Communications 155 pp 115– (2003)
[13] Volkov V, Demmel J. LU, QR and Cholesky factorizations using vector capabilities of GPUs. LAPACK Working Note 202, 2008. http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-49.html [1 Feburary 2009].
[14] AbramowitzM, StegunIA (eds), In Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables (eighth Dover printing). Dover: New York, 1972.
[15] Ohno Y, Nishibori E, Narumi T, Koishi T, Tahirov T, Ago H, Miyano H, Himeno R, Ebisuzaki T, Sakata M, Taiji M. A 281 Tflops calculation for x-ray protein structure analysis with the special-purpose computer MDGRAPE-3. Proceedings of SC07: The International Conference for High Performance Computing, Networking, Storage and Analysis, Reno, NV, U.S.A., 2007.
[16] Hamada T, Iitaka T. The Chamomile Scheme: an optimized algorithm for N-body simulations on programmable graphics processing units. http://arxiv.org/abs/astro-ph/0703100 [22 November 2008].
[17] Nyland, GPU Gems 3 (2007)
[18] Iitaka, Introduction to scientific computation with GPU (1), Journal of the Japan Society for Computational Engineering and Science 12 pp 1698– (2007)
[19] Iitaka, Introduction to scientific computation with GPU (2), Journal of the Japan Society for Computational Engineering and Science 13 pp 36– (2008)
[20] Bailey DH. DSFUN90 (Fortran-90 double-single package). http://crd.lbl.gov/???dhbailey/mpdist/mpdist.html [22 November 2008].
[21] Knuth, The Art of Computer Programming 2 (1994)
[22] Dekker, A floating-point technique for extending the available precision, Numerische Mathematik 18 pp 224– (1971) · Zbl 0226.65034
[23] BowmanJJ, SeniorTBA, UslenghiPLD (eds), In Electromagnetic and Acoustic Scattering by Simple Shapes (Revised Printing). Hemisphere: New York, 1987. ISBN: 0891168850.
[24] Saad, GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing 7 pp 856– (1986) · Zbl 0599.65018
[25] ADVENTURE project’s home page. http://adventure.sys.t.u-tokyo.ac.jp/software/samples.html [17 June 2009].
[26] Burton, The application of integral equation methods to the numerical solution of some exterior boundary-value problems, Proceedings of the Royal Society of London, Series A 323 pp 201– (1971) · Zbl 0235.65080
[27] Gumerov, Fast multipole methods on graphics processors, Journal of Computational Physics 227 pp 8290– (2008) · Zbl 1147.65012
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.