×

Fast matrix-free evaluation of discontinuous Galerkin finite element operators. (English) Zbl 1486.65253


MSC:

65N30 Finite element, Rayleigh-Ritz and Galerkin methods for boundary value problems involving PDEs

Software:

deal.ii
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Daniel S. Abdi, Lucas C. Wilcox, Timothy C. Warburton, and Francis X. Giraldo. 2019. A GPU-accelerated continuous and discontinuous Galerkin non-hydrostatic atmospheric model. Int. J. High Perf. Comput. Appl. 33, 1 (2019), 81-109.
[2] Rainer Agelek, Michael L. Anderson, Wolfgang Bangerth, and William L. Barth. 2017. On orienting edges of unstructured two- and three-dimensional meshes. ACM Trans. Math. Softw. 44 (2017), 5:1-5:22. · Zbl 1484.65320
[3] Giovanni Alzetta, Daniel Arndt, Wolfgang Bangerth, Vishal Boddu, Benjamin Brands, Denis Davydov, Rene Gassmoeller, Timo Heister, Luca Heltai, Katharina Kormann, Martin Kronbichler, Matthias Maier, Jean-Paul Pelteret, Bruno Turcksin, and David Wells. 2018. The deal.II library, version 9.0. J. Numer. Math. 26, 4 (2018), 173-184. · Zbl 1410.65363
[4] Robert Anderson, Andrew Barker, Jamie Bramwell, Jakub Cerveny, Johann Dahm, Veselin Dobrev, Yohann Dudouit, Aaron Fisher, Tzanio Kolev, Mark Stowell, and Vladimir Tomov. 2018. MFEM: Modular finite element methods. mfem.org. · Zbl 1524.65001
[5] Douglas N. Arnold, Franco Brezzi, and L. Donatella Marini. 2002. Unified analysis of discontinuous Galerkin methods for elliptic problems. SIAM J. Numer. Anal. 39 (2002), 1749-1779. · Zbl 1008.65080
[6] Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang. 2016. PETSc Users Manual. Technical Report ANL-95/11 - Revision 3.7. Argonne National Laboratory. http://www.mcs.anl.gov/petsc.
[7] Wolfgang Bangerth, Carsten Burstedde, Timo Heister, and Martin Kronbichler. 2011. Algorithms and data structures for massively parallel generic finite element codes. ACM Trans. Math. Softw. 38, 2 (2011), 14:1-14:28. · Zbl 1365.65247
[8] Peter Bastian, Christian Engwer, Jorrit Fahlke, Markus Geveler, Dominik Göddeke, Oleg Iliev, Olaf Ippisch, René Milk, Jan Mohring, Steffen Müthing, Mario Ohlberger, Dirk Ribbrock, and Stefan Turek. 2016. Hardware-based efficiency advances in the EXA-DUNE project. In Software for Exascale Computing – SPPEXA 2013-2015, Hans-Joachim Bungartz, Philipp Neumann, and Wolfgang E. Nagel (Eds.). Springer, Cham, 3-23.
[9] Peter Bastian, Christian Engwer, Dominik Göddeke, Oleg Iliev, Olaf Ippisch, Mario Ohlberger, Stefan Turek, Jorrit Fahlke, Sven Kaulmann, Steffen Müthing, and Dirk Ribbrock. 2014. EXA-DUNE: Flexible PDE solvers, numerical methods and applications. In Euro-Par 2014: Parallel Processing Workshops. Lecture Notes in Computer Science, Vol. 8806. Springer, 530-541.
[10] Jed Brown. 2010. Efficient nonlinear solvers for nodal high-order finite elements in 3D. J. Sci. Comput. 45, 1-3 (2010), 48-63. · Zbl 1203.65245
[11] Chris D. Cantwell, David Moxey, Andrew Comerford, Alessandro Bolis, Gabriele Rocco, Gianmarco Mengaldo, Daniele De Grazia, Sergey L. Yakovlev, Jean-Eloi Lombard, Dirk Ekelschot, Bastien Jordi, Hui Xu, Yumnah Mohamied, Claes Eskilsson, Blake W. Nelson, Peter Vos, Cristian Biotto, Robert M. Kirby, and Spencer J. Sherwin. 2015. Nektar++: An open-source spectral/hp element framework. Comput. Phys. Commun. 192 (2015), 205-219. · Zbl 1380.65465
[12] Lester E. Carr III, Carlos F. Borges, and Francis X. Giraldo. 2016. Matrix-free polynomial-based nonlinear least squares optimizated preconditioning and its applications to discontinuous Galerkin discretizations of the Euler equations. J. Sci. Comput. 66 (2016), 917-940. · Zbl 1398.65247
[13] Michel O. Deville, Paul F. Fischer, and Ernest H. Mund. 2002. High-order Methods for Incompressible Fluid Flow. Vol. 9. Cambridge University Press. · Zbl 1007.76001
[14] Jack Dongarra, Iain Duff, Mark Gates, Azzam Haidar, Sven Hammarling, Nicholas J. Higham, Jonathan Hogg, Pedro Valero Lara, Samuel D. Relton, Stanimire Tomov, and Mawussi Zounon. 2016. A Proposed API for Batched Basic Linear Algebra Subprograms. Technical Report. University of Tennessee. https://bit.ly/batched-blas.
[15] Niklas Fehn, Wolfgang A. Wall, and Martin Kronbichler. 2018. Efficiency of high-performance discontinuous Galerkin spectral element methods for under-resolved turbulent incompressible flows. Int. J. Numer. Meth. Fluids 88, 1 (2018), 32-54. · Zbl 1415.76451
[16] Niklas Fehn, Wolfgang A. Wall, and Martin Kronbichler. 2019. A matrix-free high-order discontinuous Galerkin compressible Navier-Stokes solver: A performance comparison of compressible and incompressible formulations for turbulent incompressible flows. Int. J. Numer. Meth. Fluids 89, 3 (2019), 71-102.
[17] Paul Fischer, Stefan Kerkemeier, Adam Peplinski, Dillon Shaver, Ananias Tomboulides, Misun Min, Aleksandr Obabko, and Elia Merzari. 2018. Nek5000 Web page. https://nek5000.mcs.anl.gov.
[18] Georg Hager and Gerhard Wellein. 2011. Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton.
[19] Alexander Heinecke, Greg Henry, and Hans Pabst. 2017. LIBXSMM: A high performance library for small matrix multiplications. https://github.com/hfp/libxsmm.
[20] Michael A. Heroux, Roscoe A. Bartlett, Vicki E. Howle, Robert J. Hoekstra, Jonathan J. Hu, Tamara G. Kolda, Richard B. Lehoucq, Keven R. Long, Roger P. Pawlowski, Eric T. Phipps, Andrew G. Salinger, Heidi K. Thornquist, Ray S. Tuminaro, James M. Willenbring, Alan Williams, and Kendall S. Stanley. 2005. An overview of the Trilinos project. ACM Trans. Math. Softw. 31, 3 (2005), 397-423. http://www.trilinos.org. · Zbl 1136.65354
[21] Jan S. Hesthaven and Tim Warburton. 2008. Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Application. Texts in Applied Mathematics, Vol. 54. Springer. · Zbl 1134.65068
[22] Florian Hindenlang, Gregor Gassner, Christoph Altmann, Andrea Beck, Marc Staudenmaier, and Claus-Dieter Munz. 2012. Explicit discontinuous Galerkin methods for unsteady problems. Comput. Fluids 61 (2012), 86-93. · Zbl 1365.76117
[23] Torsten Hoefler and Roberto Belli. 2015. Scientific benchmarking of parallel computing systems. In SC15.
[24] M. Homolya, R. C. Kirby, and D. A. Ham. 2017. Exposing and exploiting structure: Optimal code generation for high-order finite element methods. arXiv preprint 1711.02473 (2017), cs.MS.
[25] Immo Huismann, Jörg Stiller, and Jochen Fröhlich. 2017. Factorizing the factorization – a spectral-element solver for elliptic equations with linear operation count. J. Comput. Phys. 346 (2017), 437-448. · Zbl 1380.65373
[26] Intel Corporation 2017. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Corporation. Order no. 248966-037, https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf.
[27] Jim Jeffers, James Reinders, and Avinash Sodani. 2016. Intel Xeon Phi Processor High Performance Programming, Knights Landing Edition. Morgan-Kaufmann, Cambridge, MA.
[28] George E. Karniadakis and Spencer J. Sherwin. 2005. Spectral/hp Element Methods for Computational Fluid Dynamics (2nd ed.). Oxford University Press. · Zbl 1116.76002
[29] Dominic Kempf, René Hess, Steffen Müthing, and Peter Bastian. 2018. Automatic code generation for high-performance discontinuous Galerkin methods on modern architectures. arXiv preprint 1812.08075 (2018), math.NA. · Zbl 07467966
[30] Andreas Klöckner. 2014. Loo.py: Transformation-based code generation for GPUs and CPUs. In Proceedings of ARRAY ‘14: ACM SIGPLAN Workshop on Libraries, Languages, and Compilers for Array Programming. Association for Computing Machinery, Edinburgh, Scotland.
[31] Andreas Klöckner, Tim Warburton, Jeffrey Bridge, and Jan S. Hesthaven. 2009. Nodal discontinuous Galerkin methods on graphics processors. J. Comput. Phys. 228, 21 (2009), 7863-7882. · Zbl 1175.65111
[32] Matthew G. Knepley, Jed Brown, Karl Rupp, and Barry F. Smith. 2013. Achieving high performance with unified residual evaluation. arXiv preprint 1309.1204 (2013), cs.MS.
[33] Dimitri Komatitsch, Jean-Paul Ampuero, Kangchen Bai, Piero Basini, Céline Blitz, Ebru Bozdag, Emanuele Casarotti, Joseph Charles, Min Chen, Percy Galvez, Dominik Göddeke, Vala Hjörleifsdóttir, Sue Kientz, Jesús Labarta, Nicolas Le Goff, Pieyre Le Loher, Matthieu Lefebvre, Qinya Liu, Yang Luo, Alessia Maggi, Federica Magnoni, Roland Martin, René Matzen, Dennis McRitchie, Matthias Meschede, Peter Messmer, David Michéa, Surendra Nadh Somala, Tarje Nissen-Meyer, Daniel Peter, Max Rietmann, Elliott Sales de Andrade, Brian Savage, Bernhard Schuberth, Anne Sieminski, Leif Strand, Carl Tape, Jeroen Tromp, Jean-Pierre Vilotte, Zhinan Xie, and Hejun Zhu. 2015. SPECFEM 3D Cartesian User Manual. Technical Report. Computational Infrastructure for Geodynamics, Princeton University, CNRS and University of Marseille, and ETH Zürich.
[34] David Kopriva. 2009. Implementing Spectral Methods for Partial Differential Equations. Springer, Berlin. · Zbl 1172.65001
[35] Katharina Kormann. 2016. A time-space adaptive method for the Schrödinger equation. Commun. Comput. Phys. 20, 1 (2016), 60-85. · Zbl 1388.65101
[36] Katharina Kormann and Martin Kronbichler. 2011. Parallel finite element operator application: Graph partitioning and coloring. In Proceedings of the 7th IEEE International Conference on eScience. 332-339. · Zbl 1365.76121
[37] Benjamin Krank, Niklas Fehn, Wolfgang A. Wall, and Martin Kronbichler. 2017. A high-order semi-explicit discontinuous Galerkin solver for 3D incompressible flow with application to DNS and LES of turbulent channel flow. J. Comput. Phys. 348 (2017), 634-659. · Zbl 1380.76040
[38] Martin Kronbichler and Momme Allalen. 2018. Efficient high-order discontinuous Galerkin finite elements with matrix-free implementations. In Advances and Trends in Environmental Informatics, H.-J. Bungartz, D. Kranzlmüller, V. Weinberg, J. Weismüller, and V. Wohlgemuth (Eds.). 89-110.
[39] Martin Kronbichler, Ababacar Diagne, and Hanna Holgren. 2018. A fast massively parallel two-phase flow solver for the simulation of microfluidic chips. Int. J. High Perf. Comput. Appl. 32, 2 (2018), 266-287.
[40] Martin Kronbichler and Katharina Kormann. 2012. A generic interface for parallel cell-based finite element operator application. Comput. Fluids 63 (2012), 135-147. · Zbl 1365.76121
[41] Martin Kronbichler, Katharina Kormann, Igor Pasichnyk, and Momme Allalen. 2017. Fast matrix-free discontinuous Galerkin kernels on modern computer architectures. In ISC High Performance 2017, Lecture Notes in Computer Science, vol. 10266. J. M. Kunkel, R. Yokota, P. Balaji, and D. E. Keyes (Eds.). 237-255.
[42] Martin Kronbichler, Svenja Schoeder, Christopher Müller, and Wolfgang A. Wall. 2016. Comparison of implicit and explicit hybridizable discontinuous Galerkin methods for the acoustic wave equation. Internat. J. Numer. Methods Engrg. 106, 9 (2016), 712-739. · Zbl 1352.76058
[43] Martin Kronbichler and Wolfgang A. Wall. 2018. A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers. SIAM J. Sci. Comput. 40, 5 (2018), A3423-A3448. · Zbl 1402.65163
[44] Fabio Luporini, David A. Ham, and Paul H. J. Kelly. 2017. An algorithm for the optimization of finite element integration loops. ACM Trans. Math. Software 44, 1 (2017), 3:1-3:26. · Zbl 1380.65381
[45] Dave A. May, Jed Brown, and Laetitia Le Pourhiet. 2014. pTatin3D: High-performance methods for long-term lithospheric dynamics. In Supercomputing (SC14), J. M. Kunkel, T. Ludwig, and H. W. Meuer (Eds.). New Orleans, 1-11.
[46] Andrew T. T. McRae, Gheorghe-Teodor Bercea, Lawrence Mitchell, David A. Ham, and C. J. Cotter. 2016. Automated generation and symbolic manipulation of tensor product finite elements. SIAM J. Sci. Comput. 38, 5 (2016), S25-S47. · Zbl 1352.65615
[47] Axel Modave, Amik St.-Cyr, and Tim Warburton. 2016. GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models. Computers 8 Geosciences 91 (2016), 64-76.
[48] Steffen Müthing, Marian Piatkowski, and Peter Bastian. 2017. High-performance implementation of matrix-free high-order discontinuous Galerkin methods. arXiv preprint 1711.10885 (2017), math.NA. · Zbl 1380.76044
[49] Steven A. Orszag. 1980. Spectral methods for problems in complex geometries. J. Comput. Phys. 37 (1980), 70-92. · Zbl 0476.65078
[50] Anthony T. Patera. 1984. A spectral element method for fluid dynamics: Laminar flow in a channel expansion. J. Comput. Phys. 54, 3 (1984), 468-488. · Zbl 0535.76035
[51] Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. McRae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. 2016. Firedrake: Automating the finite element method by composing abstractions. ACM Trans. Math. Softw. 43, 3, Article 24 (2016), 27 pages. · Zbl 1396.65144
[52] James Reinders. 2007. Intel Threading Building Blocks. O’Reilly.
[53] Jean-Francois Remacle, Rajesh Gandham, and Tim Warburton. 2016. GPU accelerated spectral finite elements on all-hex meshes. J. Comput. Phys. 324 (2016), 246-257. · Zbl 1360.65283
[54] Joachim Schöberl. 2014. C++11 Implementation of Finite Elements in NGSolve. Technical Report ASC Report No. 30/2014. Vienna University of Technology.
[55] Svenja Schoeder, Katharina Kormann, Wolfgang A. Wall, and Martin Kronbichler. 2018. Efficient explicit time stepping of high order discontinuous Galerkin schemes for waves. SIAM J. on Sci. Comput. 40, 6 (2018), C803-C826. · Zbl 1397.65163
[56] Spencer J. Sherwin and George E. Karniadakis. 1996. Tetrahedral finite elements: Algorithms and flow simulations. J. Comput. Phys. 124, 1 (1996), 14-45. · Zbl 0847.76038
[57] Tianjiao Sun, Lawrence Mitchell, Kaushik Kulkarni, Andreas Klöckner, David A. Ham, and Paul H. J. Kelly. 2019. A study of vectorization for matrix-free finite element methods. arXiv preprint 1903.08243 (2019), cs.MS.
[58] Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures. San Diego CA. https://github.com/RRZE-HPC/likwid, retrieved on October 15, 2018.
[59] Zhi J. Wang, Krzysztof Fidkowski, Rémi Abgrall, Francesco Bassi, Doru Caraeni, Andrew Cary, Herman Deconinck, Ralf Hartmann, Koen Hillewaert, H.T. Huynh, Norbert Kroll, Georg May, Per-Olof Persson, Bram van Leer, and Miguel Visbal. 2013. High-order CFD methods: Current status and perspective. Int. J. Numer. Meth. Fluids 72, 8 (2013), 811-845. · Zbl 1455.76007
[60] Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65-76.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.