FPGA architecture and implementation of sparse matrix-vector multiplication for the finite element method. (English) Zbl 1196.65084

Summary: The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. The trends in floating-point performance are moving in favor of Field-Programmable Gate Arrays (FPGAs), hence increasing interest has grown in the scientific community to exploit this technology. We present an architecture and implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. FEM matrices display specific sparsity patterns that can be exploited to improve the efficiency of hardware designs. Our architecture exploits FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements by relying on external SDRAM for data storage while utilizing the FPGAs computational resources in a stream-through systolic approach. The architecture is based on a pipelined linear array of processing elements (PEs) coupled with a hardware-oriented matrix striping algorithm and a partitioning scheme which enables it to process arbitrarily big matrices without changing the number of PEs in the architecture. Therefore, this architecture is only limited by the amount of external RAM available to the FPGA. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA systems, this architecture can achieve 1.5 GFLOPS sustained performance. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solution techniques such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory.


65F50 Computational methods for sparse matrices
65Y20 Complexity and performance of numerical algorithms




Full Text: DOI


[1] Underwood, K.D.; Hemmert, K.S., Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance, (), 219-228
[2] J. Fender, An FPGA-based hardware development system with multi-gigabyte memory capacity and high bandwidth, Master’s thesis, University of Toronto, January 2005
[3] The Transmogrifier-4 project, http://www.eecg.utoronto.ca/ tm4/, University of Toronto, 2005
[4] BEE2: A multi-purpose computing platform, http://bwrc.eecs.berkeley.edu/Research/BEE/BEE2/, Berkeley Wireless Research Center, University of California, 2004
[5] Taylor, V.E.; Ranade, A.; Messerschmitt, D.G., SPAR: a new architecture for large finite element computations, IEEE transactions on computers, 44, 4, 531-545, (1995) · Zbl 1062.68540
[6] Zhuo, L.; Prasanna, V.K., Sparse matrix – vector multiplication on FPGAs, (), 63-74
[7] DeLorimier, M.; DeHon, A., Floating-point sparse matrix – vector multiply for FPGAs, (), 75-85
[8] Y. El-Kurdi, W.J. Gross, D. Giannacopoulos, Sparse matrix – vector multiplication for finite element method matrices on FPGAs, in: FCCM’06: 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Napa, CA, 2006, pp. 293-294
[9] Jordan, D.K.; Mazziotti, D.A., Spectral differences in real-space electronic structure calculations, Journal of chemical physics, 120, 2, 574-578, (2004)
[10] Jordan, D.K.; Mazziotti, D.A., Comparison of two genres for linear scaling in density functional theory: purification and density matrix minimization methods, Journal of chemical physics, 122, 084114, (2005)
[11] Kung, S.Y., VLSI array processors, Information and system science series, (1988), Prentice-Hall, Inc. Englewood Cliffs, NJ
[12] Heath, L.S.; Pemmaraju, S.V.; Ribbens, C.J., Processor-efficient sparse matrix – vector multiplication, Computers and mathematics with applications, 48, 589-608, (2004) · Zbl 1069.65048
[13] Melhem, R., Parallel solution of linear systems with striped sparse matrices, Parallel computing, 6, 2, 165-184, (1988) · Zbl 0643.65016
[14] Melhem, R., Determination of stripe structure for finite element matrices, SIAM journal on numerical analysis, 24, 6, 1419-1433, (1987) · Zbl 0643.65015
[15] El-Kurdi, Y.; Giannacopoulos, D.; Gross, W.J., Hardware acceleration for finite-element electromagnetics: efficient sparse matrix floating-point computations with FPGAs, IEEE transactions on magnetics, 43, 4, 1525-1528, (2007)
[16] P. Belanovic, M. Leeser, A library of parameterized floating point modules and their use, in: Proceedings of the International Conference on Field-Programmable Logic and Applications, 2002, pp. 657-666 · Zbl 1020.68542
[17] Matrix market, http://math.nist.gov/MatrixMarket/, maintained by: National Institute of Standards and Technology (NIST), June 2004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.