×

Extreme scale FMM-accelerated boundary integral equation solver for wave scattering. (English) Zbl 1461.65254

Summary: We present a boundary integral equation solver for wave scattering suited for many-core processors, which are expected to be the building blocks of energy-austere exascale systems, and on which algorithmic and architecture-oriented optimizations are essential for achieving worthy performance. We use the GMRES iterative method and the fast multipole method (FMM) to implement the matrix-vector product (MatVec) kernel. We implement highly optimized kernels for both shared- and distributed-memory architectures and provide various performance models for tuning the task-based tree traversal implementation of FMM. We develop optimal architecture-specific and algorithm-aware partitioning, load balancing, and communication reducing mechanisms to scale up to 6,144 compute nodes of a Cray XC40 with 196,608 hardware cores. Task-based traversal achieves roughly 77% and 60% of the peak single precision floating point performance of a 56-core Skylake Scalable processor and a 72-core Knights Landing processor, respectively. These represent approximately sevenfold speedups compared to the baseline scalar code. We report weak scalability in accordance with the best possible \(O(\log P)\) communication complexity and the theoretical scaling complexity of FMM. These result in up to 85% efficiency in strong scaling while computing in excess of 2 billion degrees of freedom (DoFs) on the full-scale Cray XC40. To demonstrate the applicability of the proposed implementation of FMM, we use it in the solution of the acoustic scattering problem involving soft targets. This calls for the discretization of an integral equation with an oscillatory kernel and (weak) \(1/R\) singularity. Applications of FMM to the problems involving hard targets, which call for implementation of more complicated singularity extraction/cancellation schemes to account for \(1/R^2\) singularity, or to Burton-Miller surface integral formulation, which results in a well-conditioned matrix system, are not considered in this work, but the efficiency and scalability results demonstrated here are expected to be observed for these cases as well. The numerical results are converged to within 1.0e-4 relative 2-norm residual accuracy of the analytical solution for the sphere, and a multiple flower-shaped scatterer displays comparable algebraic convergence in a more general setting.

MSC:

65N38 Boundary element methods for boundary value problems involving PDEs
78M16 Multipole methods applied to problems in optics and electromagnetic theory
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] M. Abduljabbar, M. Al Farhan, R. Yokota, and D. Keyes, Performance evaluation of computation and communication kernels of the fast multipole method on Intel manycore architecture, in Euro-Par 2017: Parallel Processing, Lecture Notes in Comput. Sci. 10417, Springer, Cham, 2017, pp. 553-564.
[2] M. Abduljabbar, G. S. Markomanolis, H. Ibeid, R. Yokota, and D. Keyes, Communication reducing algorithms for distributed hierarchical N-body problems with boundary distributions, in High Performance Computing, Lecture Notes in Comput. Sci. 10266, Springer, Cham, 2017, pp. 79-96.
[3] M. Abduljabbar and R. Yokota, N-body methods, in High Performance Parallelism Pearls, Morgan Kaufmann, Elsevier, Burlington MA, 2015, Chapter 10, pp. 175-183.
[4] S. Ambikasaran and E. Darve, The Inverse Fast Multipole Method, preprint, https://arxiv.org/abs/1407.1572, 2014.
[5] S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, D. A. May, L. C. McInnes, R. T. Mills, T. Munson, K. Rupp, P. Sanan, B. F. Smith, S. Zampini, H. Zhang, and H. Zhang, PETSc Web page, https://www.mcs.anl.gov/petsc, 2018.
[6] S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, D. A. May, L. C. McInnes, R. T. Mills, T. Munson, K. Rupp, P. Sanan, B. F. Smith, S. Zampini, H. Zhang, and H. Zhang, PETSc Users’ Manual, Tech. report ANL-95/11, Revision 3.10, Argonne National Laboratory, 2018, https://www.mcs.anl.gov/petsc.
[7] S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith, Efficient management of parallelism in object oriented numerical software libraries, in Modern Software Tools in Scientific Computing, E. Arge, A. M. Bruaset, and H. P. Langtangen, eds., Birkhäuser, Basel, 1997, pp. 163-202. · Zbl 0882.65154
[8] C. F. Bohren and D. R. Huffman, Absorption and scattering by an arbitrary particle, in Absorption and Scattering of Light by Small Particles, Wiley-VCH Verlag GmbH, New York, 2007, pp. 57-81.
[9] O. P. Bruno and L. A. Kunyansky, A fast, high-order algorithm for the solution of surface scattering problems: Basic implementation, tests, and applications, J. Comput. Phys., 169 (2001), pp. 80-110. · Zbl 1052.76052
[10] A. J. Burton and G. F. Miller, The application of integral equation methods to the numerical solution of some exterior boundary-value problems, Proc. Roy. Soc. London Ser. A, 323 (1971), pp. 201-210, https://doi.org/10.1098/rspa.1971.0097. · Zbl 0235.65080
[11] B. Chandrasekhar and S. M. Rao, Elimination of internal resonance problem associated with acoustic scattering by three-dimensional rigid body, J. Acoust. Soc. Amer., 115 (2004), pp. 2731-2737, https://doi.org/10.1121/1.1703537.
[12] K. Chen, S. Yang, J. Song, and R. Roberts, High order Nyström method for acoustic scattering, AIP Conf. Proc., 1650 (2015), pp. 1744-1749.
[13] Z.-S. Chen, W. Kreuzer, H. Waubke, and A. Svobodnik, A Burton-Miller formulation of the boundary element method for baffle problems in acoustics and the BEM/FEM coupling, Eng. Anal. Bound. Elem., 35 (2011), pp. 279-288. · Zbl 1259.76023
[14] H. Cheng, W. Y. Crutchfield, Z. Gimbutas, L. F. Greengard, J. F. Ethridge, J. Huang, V. Rokhlin, N. Yarvin, and J. Zhao, A wideband fast multipole method for the Helmholtz equation in three dimensions, J. Comput. Phys., 216 (2006), pp. 300-325. · Zbl 1093.65117
[15] J. Choi, A. Chandramowlishwaran, K. Madduri, and R. Vuduc, A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method, in Proceedings of the 7th Workshop on General-Purpose Processing Using GPUs (GPGPU-7) (Salt Lake City, UT, 2014), ACM, New York, 2014, http://www.ece.neu.edu/groups/nucar/GPGPU/GPGPU7/.
[16] R. Coifman, V. Rokhlin, and S. Wandzura, The fast multipole method for the wave equation: A pedestrian prescription, IEEE Antennas and Propagation Magazine, 35 (1993), pp. 7-12.
[17] P. Coulier, H. Pouransari, and E. Darve, The inverse fast multipole method: Using a fast approximate direct solver as a preconditioner for dense linear systems, SIAM J. Sci. Comput., 39 (2017), pp. A761-A796, https://doi.org/10.1137/15M1034477. · Zbl 1365.65068
[18] E. Darve and P. Havé, Efficient fast multipole method for low-frequency scattering, J. Comput. Phys., 197 (2004), pp. 341-363, https://doi.org/10.1016/j.jcp.2003.12.002. · Zbl 1073.65133
[19] M. G. Duffy, Quadrature over a pyramid or cube of integrands with a singularity at a vertex, SIAM J. Numer. Anal., 19 (1982), pp. 1260-1262, https://doi.org/10.1137/0719090. · Zbl 0493.65011
[20] B. Engquist and L. Ying, Fast directional algorithms for the Helmholtz kernel, J. Comput. Appl. Math., 234 (2010), pp. 1851-1859. · Zbl 1379.65103
[21] M. A. A. Farhan, D. K. Kaushik, and D. E. Keyes, Unstructured computational aerodynamics on many integrated core architecture, Parallel Comput., 59 (2016), pp. 97-118.
[22] M. A. A. Farhan and D. E. Keyes, Optimizations of unstructured aerodynamics computations for many-core architectures, IEEE Trans. Parallel Distributed Syst., 29 (2018), pp. 2317-2332, https://doi.org/10.1109/TPDS.2018.2826533.
[23] A. Gholami, D. Malhotra, H. Sundar, and G. Biros, FFT, FMM, or multigrid? A comparative study of state-of-the-art Poisson solvers for uniform and nonuniform grids in the unit cube, SIAM J. Sci. Comput., 38 (2016), pp. C280-C306, https://doi.org/10.1137/15M1010798. · Zbl 1369.65138
[24] L. Greengard and M. Berger, FMM3D Software, 2012, https://cims.nyu.edu/cmcl/fmm3dlib/fmm3dlib.html (accessed 2018-10-08).
[25] L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, J. Comput. Phys., 73 (1987), pp. 325-348. · Zbl 0629.65005
[26] N. A. Gumerov and R. Duraiswami, A broadband fast multipole accelerated boundary element method for the three dimensional Helmholtz equation, J. Acoust. Soc. Amer., 125 (2009), pp. 191-205.
[27] S. Hao, P. G. Martinsson, and P. Young, An efficient and highly accurate solver for multi-body acoustic scattering problems involving rotationally symmetric scatterers, Comput. Math. Appl., 69 (2015), pp. 304-318. · Zbl 1360.76253
[28] R. F. Harrington, Time-Harmonic Electromagnetic Fields, McGraw-Hill, New York, 1961.
[29] M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley, An overview of the Trilinos project, ACM Trans. Math. Softw., 31 (2005), pp. 397-423, https://doi.org/10.1145/1089014.1089021. · Zbl 1136.65354
[30] H. Ibeid, R. Yokota, J. Pestana, and D. Keyes, Fast multipole preconditioners for sparse matrices arising from elliptic equations, Comput. Vis. Sci., 18 (2018), pp. 213-229. · Zbl 1398.65039
[31] S. Jarvenpaa, M. Taskinen, and P. Yla-Oijala, Singularity subtraction technique for high-order polynomial vector basis functions on planar triangles, IEEE Trans. Antennas and Propagation, 54 (2006), pp. 42-49. · Zbl 1369.78672
[32] G. Kang, J. Song, W. C. Chew, K. C. Donepudi, and J.-M. Jin, A novel grid-robust higher order vector basis function for the method of moments, IEEE Trans. Antennas and Propagation, 49 (2001), pp. 908-915. · Zbl 1368.78114
[33] E. Kreyszig, Advanced Engineering Mathematics, Wiley, New York, 2011. · Zbl 1229.00004
[34] I. Lashuk, A. Chandramowlishwaran, H. Langston, T. Nguyen, R. Sampath, A. Shringarpure, R. Vuduc, L. Ying, D. Zorin, and G. Biros, A massively parallel adaptive fast-multipole method on heterogeneous architectures, in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (Portland, OR, 2009), ACM, New York, 2009, 58, https://doi.org/10.1145/1654059.1654118.
[35] I. Lashuk, A. Chandramowlishwaran, H. Langston, T.-A. Nguyen, R. Sampath, A. Shringarpure, R. Vuduc, L. Ying, D. Zorin, and G. Biros, A massively parallel adaptive fast multipole method on heterogeneous architectures, Commun. ACM, 55 (2012), pp. 101-109, https://doi.org/10.1145/2160718.2160740.
[36] D. Malhotra and G. Biros, Algorithm 967: A distributed-memory fast multipole method for volume potentials, ACM Trans. Math. Software, 43 (2016), 17. · Zbl 1371.65125
[37] W. B. March, B. Xiao, S. Tharakan, C. D. Yu, and G. Biros, A kernel-independent FMM in general dimensions, in Proceedings of the 2015 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, New York, IEEE, Washington, DC, 2015, pp. 1-12.
[38] B. Michiels, J. Fostier, I. Bogaert, and D. De Zutter, Full-wave simulations of electromagnetic scattering problems with billions of unknowns, IEEE Trans. Antennas and Propagation, 63 (2015), pp. 796-799.
[39] S. E. Mousavi and N. Sukumar, Generalized Duffy transformation for integrating vertex singularities, Comput. Mech., 45 (2009), pp. 127-140.
[40] X. M. Pan, W. C. Pi, M. L. Yang, Z. Peng, and X. Q. Sheng, Solving problems with over one billion unknowns by the MLFMA, IEEE Trans. Antennas and Propagation, 60 (2012), pp. 2571-2574. · Zbl 1369.78725
[41] A. Rahimian, I. Lashuk, S. Veerapaneni, A. Chandramowlishwaran, D. Malhotra, L. Moon, R. Sampath, A. Shringarpure, J. Vetter, R. Vuduc, D. Zorin, and G. Biros, Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures, in SC ’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (New Orleans, LA, 2010), IEEE, Washington, DC, 2010, pp. 1-11, https://doi.org/10.1109/SC.2010.42.
[42] J. Rudi, A. C. I. Malossi, T. Isaac, G. Stadler, M. Gurnis, P. W. J. Staar, Y. Ineichen, C. Bekas, A. Curioni, and O. Ghattas, An extreme-scale implicit solver for complex PDEs: Highly heterogeneous flow in earth’s mantle, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, ACM, New York, 2015, 5.
[43] K. Rupp, P. Tillet, F. Rudolf, J. Weinbub, A. Morhammer, T. Grasser, A. Jüngel, and S. Selberherr, ViennaCL–linear algebra library for multi- and many-core architectures, SIAM J. Sci. Comput., 38 (2016), pp. S412-S439, https://doi.org/10.1137/15M1026419. · Zbl 1349.65740
[44] H. A. Schenck, Improved integral formulation for acoustic radiation problems, J. Acoust. Soc. Amer., 44 (1968), pp. 41-58, https://doi.org/10.1121/1.1911085. · Zbl 0187.50302
[45] C. Schwab and W. L. Wendland, On numerical cubatures of singular surface integrals in boundary element methods, Numer. Math., 62 (1992), pp. 343-369. · Zbl 0761.65012
[46] P. Sheng, Preface, in Introduction to Wave Scattering, Localization, and Mesoscopic Phenomena, P. Sheng, ed., Academic Press, San Diego, CA, 1995.
[47] T. Takahashi, P. Coulier, and E. Darve, Application of the inverse fast multipole method as a preconditioner in a 3D Helmholtz boundary element method, J. Comput. Phys., 341 (2017), pp. 406-428. · Zbl 1376.78010
[48] M. S. Warren, 2HOT: An Improved Parallel Hashed Oct-Tree \(N\)-Body Algorithm for Cosmological Simulation, in Proceedings of the 2013 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, New York, IEEE, Washington, DC, 2013, pp. 1-12.
[49] K. Williams, Acoustic resonance scattering, J. Acoust. Soc. Amer., 95 (1994), pp. 2786-2786.
[50] L. Ying, G. Biros, D. Zorin, and H. Langston, A new parallel kernel-independent fast multipole method, in Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC ’03, ACM, New York, 2003, p. 14, https://doi.org/10.1145/1048935.1050165.
[51] R. Yokota, An FMM based on dual tree traversal for many-core architectures, J. Algorithms Comput. Tech., 7 (2013), pp. 301-324.
[52] R. Yokota and L. A. Barba, Exafmm, 2008, https://github.com/exafmm/exafmm (accessed 2018-10-08).
[53] M. Zandifar, M. Abdul Jabbar, A. Majidi, D. Keyes, N. M. Amato, and L. Rauchwerger, Composing algorithmic skeletons to express high-performance scientific applications, in Proceedings of the 29th ACM International Conference on Supercomputing, ICS ’15, ACM, New York, 2015, pp. 415-424.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.