Memory-efficient sparse matrix-matrix multiplication by row merging on many-core architectures. (English) Zbl 1391.65119


65F50 Computational methods for sparse matrices
65Y20 Complexity and performance of numerical algorithms
65Y10 Numerical algorithms for specific classes of architectures
Full Text: DOI


[1] W. Al Rawashdeh, S. Zuo, A. Melle, L. Appold, S. Koletnik, Y. Tsvetkova, N. Beztsinna, A. Pich, T. Lammers, F. Kiessling, and F. Gremse, Noninvasive assessment of elimination and retention using CT-FMT and kinetic whole-body modeling, Theranostics, 7 (2017), pp. 1499–1510, .
[2] R. R. Amossen and R. Pagh, Faster join-projects and sparse matrix multiplications, in Proceedings of the 12th International Conference on Database Theory, ACM, 2009, pp. 121–126, .
[3] A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan, Fast sparse matrix-vector multiplication on GPUs for graph applications, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’14), IEEE Press, Piscataway, NJ, 2014, pp. 781–792, .
[4] A. Azad, G. Ballard, A. Buluç, J. Demmel, L. Grigori, O. Schwartz, S. Toledo, and S. Williams, Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication, SIAM J. Sci. Comput., 38 (2016), pp. C624–C651, . · Zbl 1350.05160
[5] G. Ballard, C. Siefert, and J. Hu, Reducing communication costs for sparse matrix multiplication within algebraic multigrid, SIAM J. Sci. Comput., 38 (2016), pp. C203–C231, . · Zbl 1339.65058
[6] R. E. Bank and C. C. Douglas, Sparse matrix multiplication package (SMMP), Adv. Comput. Math., 1 (1993), pp. 127–137, . · Zbl 0824.65024
[7] N. Bell, S. Dalton, and L. N. Olson, Exposing fine-grained parallelism in algebraic multigrid methods, SIAM J. Sci. Comput., 34 (2012), pp. C123–C152, . · Zbl 1253.65041
[8] W. L. Briggs, V. E. Henson, and S. F. McCormick, A Multigrid Tutorial, 2nd ed., SIAM, Philadelphia, 2000, .
[9] A. Buluç and J. R. Gilbert, Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments, SIAM J. Sci. Comput., 34 (2012), pp. C170–C191, .
[10] T. Chan, More algorithms for all-pairs shortest paths in weighted graphs, in Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (STOC ’07), 2007, pp. 590–598. · Zbl 1231.05245
[11] S. Dalton, N. Bell, and L. Olson, Optimizing Sparse Matrix-Matrix Multiplication for the GPU, Technical report, Nvidia, 2013. · Zbl 1347.65085
[12] S. Dalton, N. Bell, L. Olson, and M. Garland, Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations, Version 0.5.0, 2014, .
[13] S. Dalton, L. Olson, and N. Bell, Optimizing sparse matrix-matrix multiplication for the GPU, ACM Trans. Math. Software, 41 (2015), 25, . · Zbl 1347.65085
[14] T. A. Davis and Y. Hu, The University of Florida Sparse Matrix Collection, ACM Trans. Math. Software, 38 (2011), 1, . · Zbl 1365.65123
[15] J. Demouth, Sparse Matrix-Matrix Multiplication on the GPU, presentation, Nvidia, GTC, 2012.
[16] M. Deveci, C. Trott, and S. Rajamanickam, Performance-portable sparse matrix-matrix multiplication for many-core architectures, in Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017, pp. 693–702, .
[17] J. R. Gilbert, S. Reinhardt,and V. B. Shah, A unified framework for numerical and combinatorial computing, Comput. Sci. Eng., 10 (2008), pp. 20–25, .
[18] J. L. Greathouse and M. Daga, Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’14), IEEE Press, Piscataway, NJ, 2014, pp. 769–780, .
[19] F. Gremse, D. Doleschel, S. Zafarnia, A. Babler, W. Jahnen-Dechent, T. Lammers, W. Lederle, and F. Kiessling, Hybrid µCT-FMT imaging and image analysis, J. Vis. Exp., 100 (2015), e52770, .
[20] F. Gremse, A. Höfter, L. Razik, F. Kiessling, and U. Naumann, GPU-accelerated adjoint algorithmic differentiation, Comput. Phys. Commun., 200 (2016), pp. 300–311, .
[21] F. Gremse, A. Höfter, L. O. Schwen, F. Kiessling, and U. Naumann, GPU-accelerated sparse matrix-matrix multiplication by iterative row merging, SIAM J. Sci. Comput., 37 (2015), pp. C54–C71, . · Zbl 1327.65090
[22] F. Gremse, M. Stärk, J. Ehling, J. R. Menzel, T. Lammers, and F. Kiessling, Imalytics Preclinical: Interactive analysis of biomedical volume data, Theranostics, 6 (2016), pp. 328–341, .
[23] F. Gremse, B. Theek, S. Kunjachan, W. Lederle, A. Pardo, S. Barth, T. Lammers, U. Naumann, and F. Kiessling, Absorption reconstruction improves biodistribution assessment of fluorescent nanoprobes using hybrid fluorescence-mediated tomography, Theranostics, 4 (2014), pp. 960–971, .
[24] A. Griewank and U. Naumann, Accumulating Jacobians as chained sparse matrix products, Math. Program., 95 (2003), pp. 555–571, . · Zbl 1023.90053
[25] F. G. Gustavson, Two fast algorithms for sparse matrices: Multiplication and permuted transposition, ACM Trans. Math. Software, 4 (1978), pp. 250–269, . · Zbl 0384.65016
[26] J. Hoberock and N. Bell, Thrust: A Parallel Template Library, Version 1.8.1, 2015, .
[27] R. Kunchum, A. Chaudhry, A. Sukumaran-Rajam, Q. Niu, I. Nisa, and P. Sadayappan, On improving performance of sparse matrix-matrix multiplication on GPUs, in Proceedings of the International Conference on Supercomputing (ICS ’17), ACM, New York, 2017, 14, .
[28] W. Liu and B. Vinter, An efficient GPU general sparse matrix-matrix multiplication for irregular data, in Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014, pp. 370–381.
[29] W. Liu and B. Vinter, A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors, J. Parallel Distrib. Comput. (IPDPS 2014 Selected Papers on Numerical and Combinatorial Algorithms), 85 (2015), pp. 47–61, .
[30] K. Matam, S. R. K. B. Indarapu, and K. Kothapalli, Sparse matrix-matrix multiplication on modern architectures, in Proceedings of the 19th International Conference on High Performance Computing, IEEE, 2012, pp. 1–10, .
[31] D. Merrill and M. Garland, Merge-based parallel sparse matrix-vector multiplication, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’16), IEEE, 2016, pp. 678–689, .
[32] Nvidia Corporation, CUDA C programming guide, Version 8.0, 2017, .
[33] Nvidia Corporation, cuSPARSE library, Version 8.0, 2017, .
[34] G. Penn, Efficient transitive closure of sparse matrices over closed semirings, Theoret. Comput. Sci., 354 (2006), pp. 72–81, . · Zbl 1088.68042
[35] S. Rosenhain, W. Al Rawashdeh, F. Kiessling, and F. Gremse, Sensitivity and accuracy of hybrid fluorescence-mediated tomography in deep tissue regions, J. Biophotonics, 10 (2017), pp. 1208–1216, .
[36] E. H. Rubensson, E. Rudberg, and P. Sałek, Sparse matrix algebra for quantum modeling of large systems, in Applied Parallel Computing. State of the Art in Scientific Computing, Lecture Notes in Comput. Sci. 4699, B. K\aa gström, E. Elmroth, J. Dongarra, and J. Waśniewski, eds., Springer, Berlin, Heidelberg, 2007, pp. 90–99, .
[37] K. Rupp, P. Tillet, F. Rudolf, J. Weinbub, A. Morhammer, T. Grasser, A. Jüngel, and S. Selberherr, ViennaCL—linear algebra library for multi- and many-core architectures, SIAM J. Sci. Comput., 38 (2016), pp. S412–S439, . · Zbl 1349.65740
[38] V. Strassen, Gaussian elimination is not optimal, Numer. Math., 13 (1969), pp. 354–356, . · Zbl 0185.40101
[39] P. D. Sulatycke and K. Ghose, Caching-efficient multithreaded fast multiplication of sparse matrices, in Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, IEEE, 1998, pp. 117–123, .
[40] S. Van Dongen, Graph clustering via a discrete uncoupling process, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 121–141, . · Zbl 1161.68041
[41] V. Vassilevska Williams, Multiplying matrices faster than Coppersmith–Winograd, in Proceedings of the 44th Annual ACM Symposium on Theory of Computing, 2012, pp. 887–898, . · Zbl 1286.65056
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.