Boundary element quadrature schemes for multi- and many-core architectures. (English) Zbl 1375.65164

Summary: In the paper we study the performance of the regularized boundary element quadrature routines implemented in the BEM4I library developed by the authors. Apart from the results obtained on the classical multi-core architecture represented by the Intel Xeon processors we concentrate on the portability of the code to the many-core family Intel Xeon Phi. Contrary to the GP-GPU programming accelerating many scientific codes, the standard x86 architecture of the Xeon Phi processors allows to reuse the already existing multi-core implementation. Although in many cases a simple recompilation would lead to an inefficient utilization of the Xeon Phi, the effort invested in the optimization usually leads to a better performance on the multi-core Xeon processors as well. This makes the Xeon Phi an interesting platform for scientists developing a software library aimed at both modern portable PCs and high performance computing environments. Here we focus at the manually vectorized assembly of the local element contributions and the parallel assembly of the global matrices on shared memory systems. Due to the quadratic complexity of the standard assembly we also present an assembly sparsified by the adaptive cross approximation based on the same acceleration techniques. The numerical results performed on the Xeon multi-core processor and two generations of the Xeon Phi many-core platform validate the proposed implementation and highlight the importance of vectorization necessary to exploit the features of modern hardware.


65N38 Boundary element methods for boundary value problems involving PDEs
65Y10 Numerical algorithms for specific classes of architectures
65Y20 Complexity and performance of numerical algorithms
35J25 Boundary value problems for second-order elliptic equations


GitHub; BEM4I; Vc
Full Text: DOI


[1] Rokhlin, V., Rapid solution of integral equations of classical potential theory, J. Comput. Phys., 60, 2, 187-207, (1985) · Zbl 0629.65122
[2] Greengard, L.; Rokhlin, V., A fast algorithm for particle simulations, J. Comput. Phys., 73, 2, 325-348, (1987) · Zbl 0629.65005
[3] Of, G., Fast multipole methods and applications, (Schanz, M.; Steinbach, O., Boundary Element Analysis, Lecture Notes in Applied and Computational Mechanics, vol. 29, (2007), Springer Berlin, Heidelberg), 135-160 · Zbl 1298.74245
[4] Rjasanow, S.; Steinbach, O., (The Fast Solution of Boundary Integral Equations, Mathematical and Analytical Techniques with Applications to Engineering, (2007), Springer) · Zbl 1119.65119
[5] Bebendorf, M., (Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems, Lecture Notes in Computational Science and Engineering, (2008), Springer) · Zbl 1151.65090
[6] Of, G.; Steinbach, O., The all-floating boundary element tearing and interconnecting method, J. Numer. Math., 17, 4, 277-298, (2009) · Zbl 1423.74943
[7] Langer, U.; Steinbach, O., Boundary element tearing and interconnecting methods, Computing, 71, 3, 205-228, (2003) · Zbl 1037.65123
[8] Pechstein, C., (Finite and Boundary Element Tearing and Interconnecting Solvers for Multiscale Problems, Lecture Notes in Computational Science and Engineering, (2012), Springer Berlin, Heidelberg)
[9] Merta, M.; Zapletal, J., Acceleration of boundary element method by explicit vectorization, Adv. Eng. Softw., 86, 70-79, (2015)
[10] Kretz, M.; Lindenstruth, V., Vc: A C++ library for explicit vectorization, Softw. - Pract. Exp., 42, 11, 1409-1430, (2012), URL https://github.com/VcDevel/Vc
[11] Merta, M.; Zapletal, J.; Jaros, J., Many core acceleration of the boundary element method, (Kozubek, T.; Blaheta, R.; Šístek, J.; Rozložník, M.; Čermák, M., High Performance Computing in Science and Engineering: Second International Conference, HPCSE 2015, Soláň, Czech Republic, May 25-28, 2015, Revised Selected Papers, (2016), Springer International Publishing), 116-125 · Zbl 1341.65048
[12] OpenMP Architecture Review Board, OpenMP Application Program Interface, 7 2013. URL http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.
[13] Jeffers, J.; Reinders, J., Intel xeon phi coprocessor high performance programming, (2013), Morgan Kaufmann
[14] Jeffers, J.; Reinders, J., High performance parallelism pearls volume one: multicore and many-core programming approaches, (2014), Elsevier Science
[15] Jeffers, J.; Reinders, J., High performance parallelism pearls volume two: multicore and many-core programming approaches, (2015), Elsevier Science
[16] Jeffers, J.; Reinders, J.; Sodani, A., Intel xeon phi processor high performance programming: knights landing edition, (2016), Elsevier Science
[17] Cunha, M. T.F.; Telles, J. C.F.; Ribeiro, F. L.B., Streaming SIMD extensions applied to boundary element codes, Adv. Eng. Softw., 39, 11, 888-898, (2008) · Zbl 1147.65100
[18] Iemma, U., On the use of a SIMD vector extension for the fast evaluation of boundary element method coefficients, Adv. Eng. Softw., 41, 3, 451-463, (2010) · Zbl 1303.76097
[19] López-Portugués, M.; López-Fernández, J. A.; Díaz-Gracia, N.; Ayestarán, R.; Ranilla, J., Aircraft noise scattering prediction using different accelerator architectures, J. Supercomput., 70, 2, 612-622, (2014)
[20] Banaś, K.; Krużel, F.; Bielański, J., Finite element numerical integration for first order approximations on multi- and many-core architectures, Comput. Methods Appl. Mech. Engrg., 305, 827-848, (2016) · Zbl 1425.65144
[21] Erichsen, S.; Sauter, S. A., Efficient automatic quadrature in 3-d Galerkin BEM, Comput. Methods Appl. Mech. Engrg., 157, 3-4, 215-224, (1998) · Zbl 0943.65139
[22] Sauter, S.; Schwab, C., (Boundary Element Methods, Springer Series in Computational Mathematics, (2010), Springer)
[23] Zapletal, J.; Bouchala, J., Effective semi-analytic integration for hypersingular Galerkin boundary integral equations for the Helmholtz equation in 3D, Appl. Math., 59, 5, 527-542, (2014) · Zbl 1340.65282
[24] Zapletal, J., The boundary element method for the Helmholtz equation in 3D, (2011), VŠB-TU Ostrava, (Master’s thesis)
[25] M. Merta, J. Zapletal, IT4Innovations, 2013. URL http://bem4i.it4i.cz.
[26] Bandara, K.; Cirak, F.; Of, G.; Steinbach, O.; Zapletal, J., Boundary element based multiresolution shape optimisation in electrostatics, J. Comput. Phys., 297, 584-598, (2015) · Zbl 1349.78081
[27] Steinbach, O., (Numerical Approximation Methods for Elliptic Boundary Value Problems: Finite and Boundary Elements, Texts in Applied Mathematics, (2008), Springer) · Zbl 1153.65302
[28] Hildebrand, F., (Introduction to Numerical Analysis: Second Edition, Dover Books on Mathematics, (2013), Dover Publications)
[29] Lukáš, D.; Kovář, P.; Kovářová, T.; Merta, M., A parallel fast boundary element method using cyclic graph decompositions, Numer. Algorithms, 70, 4, 807-824, (2015) · Zbl 1332.65177
[30] Steinbach, O., Artificial multilevel boundary element preconditioners, Proc. Appl. Math. Mech., 3, 1, 539-542, (2003) · Zbl 1354.65060
[31] Of, G., An efficient algebraic multigrid preconditioner for a fast multipole boundary element method, Computing, 82, 2, 139-155, (2008) · Zbl 1146.65079
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.