zbMATH — the first resource for mathematics

Optimising the performance of the spectral/\(hp\) element method with collective linear algebra operations. (English) Zbl 1439.65206
Summary: As computing hardware evolves, increasing core counts mean that memory bandwidth is becoming the deciding factor in attaining peak performance of numerical methods. High-order finite element methods, such as those implemented in the spectral/\(hp\) framework Nektar++, are particularly well-suited to this environment. Unlike low-order methods that typically utilise sparse storage, matrices representing high-order operators have greater density and richer structure. In this paper, we show how these qualities can be exploited to increase runtime performance on nodes that comprise a typical high-performance computing system, by amalgamating the action of key operators on multiple elements into a single, memory-efficient block. We investigate different strategies for achieving optimal performance across a range of polynomial orders and element types. As these strategies all depend on external factors such as BLAS implementation and the geometry of interest, we present a technique for automatically selecting the most efficient strategy at runtime.

65N35 Spectral, collocation and related methods for boundary value problems involving PDEs
65N22 Numerical solution of discretized equations for boundary value problems involving PDEs
65N30 Finite element, Rayleigh-Ritz and Galerkin methods for boundary value problems involving PDEs
PDF BibTeX Cite
Full Text: DOI
[1] Karniadakis, G.; Sherwin, S., Spectral \(/\) hp Element Methods for Computational Fluid Dynamics (2005), Oxford University Press
[2] Lombard, J.-E. W.; Moxey, D.; Sherwin, S. J.; Hoessler, J. F.A.; Dhandapani, S.; Taylor, M. J., Implicit large-eddy simulation of a wingtip vortex, AIAA J., 54, 2, 506-518 (2016)
[3] Cantwell, C. D.; Yakovlev, S.; Kirby, R. M.; Peters, N. S.; Sherwin, S. J., High-order spectral/hp element discretisation for reaction-diffusion problems on surfaces: Application to cardiac electrophysiology, J. Comput. Phys., 257, 813-829 (2014)
[4] Moxey, D.; Ekelschot, D.; Keskin, U.; Sherwin, S. J.; Peiró, J., A thermo-elastic analogy for high-order curvilinear meshing with control of mesh validity and quality, Procedia Eng., 82, 127-135 (2014)
[5] Nogueira, A. C.; Bittencourt, M. L., Spectral/hp finite elements applied to linear and non-linear structural elastic problems, Lat. Am. J. Solids Struct., 4, 61-85 (2007)
[6] Comerford, A.; Chooi, K.; Nowak, M.; Weinberg, P.; Sherwin, S., A combined numerical and experimental framework for determining permeability properties of the arterial media, Biomech. Model. Mech. Biol., 1-17 (2014)
[7] Eskilsson, C.; Sherwin, S., A triangular spectral/hp discontinuous Galerkin method for modelling 2D shallow water equations, Internat. J. Numer. Methods Fluids, 45, 605-623 (2004)
[8] Cantwell, C.; Moxey, D.; Comerford, A.; Bolis, A.; Rocco, G.; Mengaldo, G.; de Grazia, D.; Yakovlev, S.; Lombard, J.-E.; Ekelschot, D., Nektar++: An open-source spectral/hp element framework, Comput. Phys. Comm. (2015)
[9] Yakovlev, S.; Moxey, D.; Sherwin, S. J.; Kirby, R. M., To CG or to HDG: a comparative study in 3D, J. Sci. Comp., 67, 1, 192-220 (2016)
[10] Witherden, F. D.; Farrington, A. M.; Vincent, P. E., PyFR: An open source framework for solving advection-diffusion type problems on streaming architectures using the flux reconstruction approach, Comput. Phys. Comm., 185, 3028-3040 (2014)
[11] King, J.; Yakovlev, S.; Fu, Z.; Kirby, R. M.; Sherwin, S. J., Exploiting batch processing on streaming architectures to solve 2D elliptic finite element problems: A hybridized discontinuous galerkin (HDG) case study, J. Sci. Comput., 60, 457-482 (2014)
[13] Vos, P. E.; Sherwin, S. J.; Kirby, R. M., From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low-and high-order discretisations, J. Comput. Phys., 229, 5161-5181 (2010)
[14] Cantwell, C.; Sherwin, S.; Kirby, R.; Kelly, P., From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements, Comput. & Fluids, 43, 23-28 (2011)
[15] Cantwell, C.; Sherwin, S.; Kirby, R.; Kelly, P., From h to p efficiently: selecting the optimal spectral/hp discretisation in three dimensions, Math. Model. Nat. Phenom., 6, 84-96 (2011)
[16] Shin, J.; Hall, M. W.; Chame, J.; Chen, C.; Fischer, P. F.; Hovland, P. D., Speeding up nek5000 with autotuning and specialization, (Proceedings of the 24th ACM International Conference on Supercomputing (2010), ACM), 253-262
[17] Stewart, J. R.; Edwards, H. C., The sierra framework for developing advanced parallel mechanics applications, (Large-Scale PDE-Constrained Optimization (2003), Springer), 301-315
[18] Pawlowski, R. P.; Phipps, E. T.; Salinger, A. G.; Owen, S. J.; Siefert, C. M.; Staten, M. L., Automating embedded analysis capabilities and managing software complexity in multiphysics simulation, part ii: Application to partial differential equations, Sci. Program., 20, 327-345 (2012)
[19] Duffy, M. G., Quadrature over a pyramid or cube of integrands with a singularity at a vertex, SIAM J. Numer. Anal., 19, 1260-1262 (1982)
[20] Orszag, S. A., Spectral methods for problems in complex geometries, J. Comput. Phys., 37, 70-92 (1980)
[23] Gustavson, F. G., High-performance linear algebra algorithms using new generalized data structures for matrices, IBM J. Res. Dev., 47, 31-55 (2003)
[24] Williams, S.; Waterman, A.; Patterson, D., Roofline: an insightful visual performance model for multicore architectures, Commun. ACM, 52, 65-76 (2009)
[25] Wozniak, B. D.; Witherden, F. D.; Russell, F. P.; Vincent, P. E.; Kelly, P. H., GiMMiK—Generating bespoke matrix multiplication kernels for accelerators: Application to high-order computational fluid dynamics, Comput. Phys. Comm. (2016)
[26] Hesthaven, J. S.; Warburton, T., Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications, vol. 54 (2007), Springer
[27] de Grazia, D.; Mengaldo, G.; Moxey, D.; Vincent, P. E.; Sherwin, S. J., Connections between the discontinuous Galerkin method and high-order flux reconstruction schemes, Internat. J. Numer. Methods Fluids, 75, 860-877 (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.