GPU acceleration for FEM-based structural analysis. (English) Zbl 1354.65246

Summary: Graphic Processing Units (GPUs) have greatly exceeded their initial role of graphics accelerators and have taken a new role of co-processors for computation – heavy tasks. Both hardware and software ecosystems have now matured, with fully IEEE compliant double precision and memory correction being supported and a rich set of software tools and libraries being available. This in turn has lead to their increased adoption in a growing number of fields, both in academia and, more recently, in industry. In this review we investigate the adoption of GPUs as accelerators in the field of Finite Element Structural Analysis, a design tool that is now essential in many branches of engineering. We survey the work that has been done in accelerating the most time consuming steps of the analysis, indicate the speedup that has been achieved and, where available, highlight software libraries and packages that will enable the reader to take advantage of such acceleration. Overall, we try to draw a high level picture of where the state of the art is currently at.


65N30 Finite element, Rayleigh-Ritz and Galerkin methods for boundary value problems involving PDEs
65Y10 Numerical algorithms for specific classes of architectures
Full Text: DOI


[1] Acceleware http://www.acceleware.com/matrix-solvers
[2] Anzt, H; Tomov, S; Gates, M; Dongarra, J; Heuveline, V, Block-asynchronous multigrid smoothers for GPU-accelerated systems, Proc Comput Sci, 9, 7-16, (2012)
[3] BCSLIB-EXT http://www.aanalytics.com/products.htm
[4] Bell, N; Garland, M, Implementing sparse matrix-vector multiplication on throughput-oriented processors, 1-11, (2009), New York
[5] Bolz, J; Farmer, I; Grinspun, E; Schrooder, P, Sparse matrix solvers on the GPU: conjugate gradients and multigrid, 917-924, (2003), New York
[6] Botsch, M; Bommes, D; Vogel, C; Kobbelt, L, GPU-based tolerance volumes for mesh processing, 237-243, (2004)
[7] Buatois, L; Caumon, G; Lévy, B, Concurrent number cruncher: an efficient sparse linear solver on the GPU, 358-371, (2007)
[8] Cecka, C; Lew, A; Darve, E, Assembly of finite element methods on graphics processors, Int J Numer Methods Eng, 85, 640-669, (2011) · Zbl 1217.80146
[9] Cevahir, A; Nukada, A; Matsuoka, S, Fast conjugate gradients with multiple gpus, 893-903, (2009)
[10] Cevahir, A; Nukada, A; Matsuoka, S, High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning, Comput Sci Res Dev, 25, 83-91, (2010)
[11] Choi, J; Singh, A; Vuduc, R, Model-driven autotuning of sparse matrix-vector multiply on gpus, 115-126, (2010), New York
[12] CHOLMOD http://www.cise.ufl.edu/research/sparse/cholmod/
[13] Crivelli, L; Dunbar, M, Evolving use of GPU for dassault systemes simulation products, (2012)
[14] CULA Sparse http://www.culatools.com/sparse/
[15] CUSP http://code.google.com/p/cusp-library/ · Zbl 0785.11030
[16] DeCoro, C; Tatarchuk, N, Real-time mesh simplification using the GPU, 161-166, (2007)
[17] Dehnavi, MM; Fernandez, D; Gaudiot, JL; Giannacopoulos, D, Parallel sparse approximate inverse preconditioning on graphic processing units, IEEE Trans Parallel Distrib Syst, 99, 1, (2012)
[18] Filipovic, J; Peterlik, I; Fousek, J, GPU acceleration of equations assembly in finite elements method—preliminary results, (2009)
[19] George, T; Saxena, V; Gupta, A; Singh, A; Choudhury, A, Multifrontal factorization of sparse SPD matrices on gpus, 372-383, (2011)
[20] Georgescu, S; Chow, P, GPU accelerated CAE using open solvers and the cloud, Comput Archit News, 39, 14-19, (2011)
[21] Georgescu, S; Okuda, H, Conjugate gradients on multiple gpus, Int J Numer Methods Fluids, 64, 1254-1273, (2010) · Zbl 1206.65131
[22] Geveler, M; Ribbrock, D; Göddeke, D; Zajac, P; Turek, S, Efficient finite element geometric multigrid solvers for unstructured grids on gpus, (2011) · Zbl 1284.76249
[23] Göddeke, D; Strzodka, R; Turek, S; Hülsemann, F (ed.); Kowarschik, M (ed.); Rüde, UA (ed.), Accelerating double precision FEM simulations with gpus, 139-144, (2005), San Diego
[24] Göddeke, D; Strzodka, R; Turek, SA, Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations, Int J Parallel Emerg Dist Syst, 22, 221-256, (2007) · Zbl 1188.68084
[25] Göddeke D, Strzodka RA (2008) Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (part 2: double precision GPUs). Tech rep, Fakultät für Mathematik, TU Dortmund (2008). Ergebnisberichte des Instituts für Angewandte Mathematik, nummer 370 · Zbl 1217.80146
[26] Göhner U (2012) Usage of GPU in LS-DYNA. LS-DYNA forum
[27] Haase, G; Liebmann, M; Douglas, C; Plank, G, A parallel algebraic multigrid solver on graphics processing units, 38-47, (2010)
[28] Heuveline, V; Lukarski, D; Trost, N; Weiss, JP; Keller, R (ed.); Kramer, D (ed.); Weiss, JP (ed.), Parallel smoothers for matrix-based geometric multigrid methods on locally refined meshes using multicore CPUs and gpus, 158-171, (2012), Berlin
[29] Hjelmervik, J; Léon, J, GPU-accelerated shape simplification for mechanical-based applications, 91-102, (2007), New York
[30] Kamiabad A (2011) Implementing a preconditioned iterative linear solver using massively parallel graphics processing units. Master’s thesis, University of Toronto
[31] Kraus, J; Foster, M; Keller, R (ed.); Kramer, D (ed.); Weiss, JP (ed.), Efficient AMG on heterogeneous systems, No. 7174, 133-146, (2012), Berlin
[32] Krawezik, G; Poole, G, Accelerating the ANSYS direct sparse solver with gpus, (2009)
[33] Krüger, J; Westermann, R, Linear algebra operators for GPU implementation of numerical algorithms, ACM Trans Graph, 22, 908-916, (2003)
[34] Lacoste X, Ramet P, Faverge M, Ichitaro Y, Dongarra J et al (2012) Sparse direct solvers with accelerators over DAG runtimes. Tech rep 7972, INRIA
[35] LAMA http://www.libama.org · Zbl 1284.76249
[36] LAToolbox from HiFlow http://www.hiflow3.org
[37] Lequiniou, E; Zhou, H, Speedup altair RADIOSS solvers using NVIDIA GPU, (2012)
[38] Li R, Saad Y (2010) GPU-accelerated preconditioned iterative linear solvers. Tech rep, University of Minnesota
[39] Liao, C, MSC nastran sparse direct solvers for tesla gpus, (2012)
[40] Lucas R, Wagenbreth G, Tran J, Davis D (2007) Multifrontal computations on GPUs. Tech rep, Unpublished ISI white paper · Zbl 1323.65136
[41] Luitjens, J; Williams, A; Heroux, M, Optimizing minife an implicit finite element application on gpus, (2012)
[42] Maciol, P; Plaszewski, P; Banas, K, 3D finite element numerical integration on gpus, Proc Comput Sci, 1, 1087-1094, (2010)
[43] MatrixPro-GSS http://www.matrixprosoftware.com/
[44] Minden, V; Smith, B; Knepley, M, Preliminary implementation of petsc using gpus, (2010)
[45] MiniFE https://software.sandia.gov/mantevo/download.html
[46] Naumov M (2011) Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS. Technical report and white paper
[47] Neic, A; Liebmann, M; Haase, G, Algebraic multigrid solver on clusters of CPUs and gpus, 389-398, (2012)
[48] NVIDIA (2012) NVIDIA CUDA programming guide 5.0
[49] PASTIX http://pastix.gforge.inria.fr
[50] PETSc http://www.mcs.anl.gov/petsc/
[51] Płaszewski, P; Macioł, P; Banaś, K, Finite element numerical integration on gpus, 411-420, (2010)
[52] Posey, S; Courteille, F, GPU progress in sparse matrix solvers for applications in computational mechanics, (2012)
[53] Qi, M; Cao, TT; Tan, TS, Computing 2D constrained Delaunay triangulation using the GPU, 39-46, (2012), New York
[54] Rong, G; Tan, T; Cao, T; etal., Computing two-dimensional Delaunay triangulation using graphics hardware, 89-97, (2008), New York
[55] Sawyer, W; Vanini, C; Fourestey, G; Popescu, R, SPAI preconditioners for HPC applications, PAMM, 12, 651-652, (2012)
[56] Schenk, O; Christen, M; Burkhart, H, Algorithmic performance studies on graphics processing units, J Parallel Distrib Comput, 68, 1360-1369, (2008)
[57] Shontz, SM; Nistor, DM; Jiao, X (ed.); Weill, JC (ed.), CPU-GPU algorithms for triangular surface mesh simplification, 475-492, (2013), Berlin
[58] The Khronos Group (2011) OpenCL specification 1.2
[59] Verschoor, M; Jalba, AC, Analysis and performance estimation of the conjugate gradient method on multiple gpus, Parallel Comput, 38, 552-575, (2012)
[60] ViennaCL http://viennacl.sourceforge.net/
[61] Vuduc, R; Chandramowlishwaran, A; Choi, J; Guney, M; Shringarpure, A, On the limits of GPU acceleration, 13, (2010)
[62] Wagner, M; Rupp, K; Weinbub, J, A comparison of algebraic multigrid preconditioners using graphics processing units and multi-core central processing units, 1-8, (2012)
[63] Wang, M; Klie, H; Parashar, M; Sudan, H, Solving sparse linear systems on NVIDIA tesla gpus, 864-873, (2009)
[64] Weber, D; Bender, J; Schnoes, M; Stork, A; Fellner, D, Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications, Comput Graph Forum, 32, 16-26, (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.