Röhrig-Zöllner, Melven; Thies, Jonas; Kreutzer, Moritz; Alvermann, Andreas; Pieper, Andreas; Basermann, Achim; Hager, Georg; Wellein, Gerhard; Fehske, Holger Increasing the performance of the Jacobi-Davidson method by blocking. (English) Zbl 1329.65077 SIAM J. Sci. Comput. 37, No. 6, C697-C722 (2015). Summary: Block variants of the Jacobi-Davidson method for computing a few eigenpairs of a large sparse matrix are known to improve the robustness of the standard algorithm when it comes to computing multiple or clustered eigenvalues. In practice, however, they are typically avoided because the total number of matrix-vector operations increases. In this paper we present the implementation of a block Jacobi-Davidson solver. By detailed performance engineering and numerical experiments we demonstrate that the increase in operations is typically more than compensated by performance gains through better cache usage on modern CPUs, resulting in a method that is both more efficient and robust than its single vector counterpart. The steps to be taken to achieve a block speedup involve both kernel optimizations for sparse matrix and block vector operations, and algorithmic choices to allow using blocked operations in most parts of the computation. We discuss the aspect of avoiding synchronization in the algorithm and show by numerical experiments with our hybrid parallel implementation that a significant speedup through blocking can be achieved for a variety of matrices on up to 5120 CPU cores as long as at least about 20 eigenpairs are sought. Cited in 7 Documents MSC: 65F15 Numerical computation of eigenvalues and eigenvectors of matrices 65F50 Computational methods for sparse matrices 65Y05 Parallel numerical computation 65Y20 Complexity and performance of numerical algorithms Keywords:sparse eigenvalue problems; Jacobi-Davidson; block methods; performance engineering; high performance computing; multicore processors; hybrid parallel implementation; algorithm; numerical experiment; sparse matrix Software:FEAST; STREAM; TraceMIN; SELL_C_sigma; STREAM benchmark; likwid; BLAS; Sparsity; Trilinos; JADAMILU; PRIMME; PT-Scotch; MatrixMarket; ParMETIS; SparseMatrix PDFBibTeX XMLCite \textit{M. Röhrig-Zöllner} et al., SIAM J. Sci. Comput. 37, No. 6, C697--C722 (2015; Zbl 1329.65077) Full Text: DOI Link References: [1] P.-A. Absil, R. Mahony, R. Sepulchre, and P. Van Dooren, {\it A Grassmann-Rayleigh quotient iteration for computing invariant subspaces}, SIAM Rev., 44 (2002), pp. 57-73. · Zbl 0995.65037 [2] H. M. Aktulga, A. Buluc, S. Williams, and C. Yang, {\it Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations}, in Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium, 2014, pp. 1213-1222. [3] J. Andrzejewski, {\it On optimizing Jacobi-Davidson method for calculating eigenvalues in low dimensional structures using eight band \(k · p\) model}, J. Comput. Phys., 249 (2013), pp. 22-35. · Zbl 1301.81067 [4] A. Auerbach, {\it Interacting Electrons and Quantum Magnetism}, Grad. Texts Contemp. Phys., Springer, New York, 1994. [5] A. H. Baker, J. M. Dennis, and E. R. Jessup, {\it On improving linear solver performance: A block variant of GMRES}, SIAM J. Sci. Comput., 27 (2006), pp. 1608-1626. · Zbl 1099.65029 [6] F. L. Bauer, {\it Das Verfahren der Treppeniteration und verwandte Verfahren zur Lösung algebraischer Eigenwertprobleme}, Z. Angew. Math. Phys., 8 (1957), pp. 214-235. · Zbl 0078.12103 [7] L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley, {\it An updated set of basic linear algebra subprograms (BLAS)}, ACM Trans. Math. Software, 28 (2002), pp. 135-151. [8] I. Bloch, J. Dalibard, and W. Zwerger, {\it Many-body physics with ultracold gases}, Rev. Modern Phys., 80 (2008), pp. 885-964. [9] R. F. Boisvert, R. Pozo, K. Remington, R. F. Barrett, and J. J. Dongarra, {\it Matrix Market: A web resource for test matrix collections}, in The Quality of Numerical Software: Assessment and Enhancement, Chapman & Hall, London, 1997, pp. 125-137. [10] M. Bollhöfer and Y. Notay, {\it JADAMILU: A software code for computing selected eigenvalues of large sparse symmetric matrices}, Comput. Phys. Commun., 177 (2007), pp. 951-964. · Zbl 1196.65072 [11] C. Chevalier and F. Pellegrini, {\it PT-SCOTCH: A tool for efficient parallel graph ordering}, Parallel Comput., 34 (2008), pp. 318-331. [12] E. Cuthill and J. McKee, {\it Reducing the bandwidth of sparse symmetric matrices}, in Proceedings of the 24th National ACM Conference, 1969, pp. 157-172. [13] T. A. Davis and Y. Hu, {\it The University of Florida sparse matrix collection}, ACM Trans. Math. Software, 38 (2011), 1. · Zbl 1365.65123 [14] J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, {\it Communication-optimal parallel and sequential QR and LU factorizations}, SIAM J. Sci. Comput., 34 (2012), pp. A206-A239. · Zbl 1241.65028 [15] F. H. L. Essler, H. Frahm, F. Göhmann, A. Klümper, and V. E. Korepin, {\it The One-Dimensional Hubbard Model}, Cambridge University Press, Cambridge, UK, 2005. · Zbl 1107.82014 [16] Y. T. Feng, {\it An integrated Davidson and multigrid solution approach for very large scale symmetric eigenvalue problems}, Comput. Methods Appl. Mech. Engrg., 190 (2001), pp. 3543-3563. · Zbl 1006.74085 [17] D. R. Fokkema, G. L. G. Sleijpen, and H. A. Van der Vorst, {\it Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils}, SIAM J. Sci. Comput., 20 (1998), pp. 94-125. · Zbl 0924.65027 [18] W. D. Gropp, D. K. Kaushik, D. E. Keyes, and B. F. Smith, {\it Towards realistic performance bounds for implicit CFD codes}, in Proceedings of Parallel CFD ’99, Elsevier, Amsterdam, 1999, pp. 233-240. [19] G. Hager and G. Wellein, {\it Introduction to High Performance Computing for Scientists and Engineers}, Chapman & Hall/CRC Comput. Sci. Ser., CRC Press, Boca Raton, FL, 2010. [20] M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley, {\it An overview of the Trilinos project}, ACM Trans. Math. Software, 31 (2005), pp. 397-423. · Zbl 1136.65354 [21] M. E. Hochstenbach and Y. Notay, {\it The Jacobi-Davidson method}, GAMM-Mitt., 29 (2006), pp. 368-382. · Zbl 1177.65055 [22] M. Hoemmen, {\it Communication-Avoiding Krylov Subspace Methods}, Ph.D. thesis, University of California, Berkeley, Berkeley, CA, 2010. [23] G. W. Howell, J. W. Demmel, C. T. Fulton, S. Hammarling, and K. Marmol, {\it Cache efficient bidiagonalization using BLAS \textup2.5 operators}, ACM Trans. Math. Software, 34 (2008), pp. 14:1-14:33. · Zbl 1190.65056 [24] E.-J. Im, K. A. Yelick, and R. Vuduc, {\it SPARSITY: An optimization framework for sparse matrix kernels}, Int. J. High Perform. C., 18 (2004), pp. 135-158. [25] G. Karypis and K. Schloegel, {\it ParMETIS. Parallel Graph Partitioning and Sparse Matrix Ordering Library}, 4.0 ed., Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 2013. [26] A. Klinvex, F. Saied, and A. Sameh, {\it Parallel implementations of the trace minimization scheme TraceMIN for the sparse symmetric eigenvalue problem}, Comput. Math. Appl., 65 (2013), pp. 460-468. · Zbl 1319.65027 [27] M. Kreutzer, G. Hager, G. Wellein, H. Fehske, and A. R. Bishop, {\it A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units}, SIAM J. Sci. Comput., 36 (2014), pp. C401-C423. · Zbl 1307.65055 [28] M. Kreutzer, J. Thies, M. Röhrig-Zöllner, A. Pieper, F. Shahzad, M. Galgon, A. Basermann, H. Fehske, G. Hager, and G. Wellein, {\it GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems}, CoRR, abs/1507.08101, 2015. · Zbl 1329.65077 [29] B. C. Lee, R. W. Vuduc, J. W. Demmel, K. A. Yelick, M. de Lorimier, and L. Zhong, {\it Performance Optimizations and Bounds for Sparse Symmetric Matrix-Multiple Vector Multiply}, Tech. Report UCB/CSD-03-1297, EECS Department, University of California, Berkeley, Berkeley, CA, 2003. [30] X. Liu, E. Chow, K. Vaidyanathan, and M. Smelyanskiy, {\it Improving the performance of dynamical simulations via multiple right-hand sides}, in Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012, pp. 36-47. [31] J. D. McCalpin, {\it STREAM: Sustainable Memory Bandwidth in High Performance Computers}, Tech. Report, University of Virginia, Charlottesville, VA, 1991-2007 (continually updated). [32] Y. Notay, {\it Convergence analysis of inexact Rayleigh quotient iteration}, SIAM J. Matrix Anal. Appl., 24 (2003), pp. 627-644. · Zbl 1045.65032 [33] Y. Saad, {\it Iterative Methods for Sparse Linear Systems}, 2nd ed., SIAM, Philadelphia, 2003. · Zbl 1031.65046 [34] Y. Saad, {\it Numerical Methods for Large Eigenvalue Problems}, revised ed., Classics Appl. Math. 66, SIAM, Philadelphia, 2011. · Zbl 1242.65068 [35] G. Schubert, H. Fehske, G. Hager, and G. Wellein, {\it Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems}, Parallel Process. Lett., 21 (2011), pp. 339-358. [36] A. Stathopoulos, {\it Locking Issues for Finding a Large Number of Eigenvectors of Hermitian Matrices}, Tech. Report WM-CS-2005-09, Department of Computer Science, College of William and Mary, Williamsburg, VA, 2005; preprint, submitted to Elsevier Science, 2006. [37] A. Stathopoulos, {\it Nearly optimal preconditioned methods for Hermitian eigenproblems under limited memory. Part I: Seeking one eigenvalue}, SIAM J. Sci. Comput., 29 (2007), pp. 481-514. · Zbl 1137.65019 [38] A. Stathopoulos and J. R. McCombs, {\it Nearly optimal preconditioned methods for Hermitian eigenproblems under limited memory. Part II: Seeking many eigenvalues}, SIAM J. Sci. Comput., 29 (2007), pp. 2162-2188. · Zbl 1151.65320 [39] A. Stathopoulos and J. R. McCombs, {\it PRIMME: Preconditioned iterative multimethod eigensolver–methods and software description}, ACM Trans. Math. Software, 37 (2010), pp. 1-30. · Zbl 1364.65087 [40] P. T. P. Tang and E. Polizzi, {\it FEAST as a subspace iteration eigensolver accelerated by approximate spectral projection}, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 354-390. · Zbl 1303.65018 [41] J. Treibig, G. Hager, and G. Wellein, {\it LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments}, in Proceedings of the 39th International IEEE Conference on Parallel Processing Workshops (ICPPW ’10), Washington, DC, 2010, pp. 207-216. [42] S. W. Williams, A. Waterman, and D. A. Patterson, {\it Roofline: An insightful visual performance model for multicore architectures}, Commun. ACM, 52 (2009), pp. 65-76. [43] K. Wu, Y. Saad, and A. Stathopoulos, {\it Inexact Newton preconditioning techniques for large symmetric eigenvalue problems}, Electron. Trans. Numer. Anal., 7 (1998), pp. 202-214. · Zbl 0916.65035 [44] Y. Zhou, {\it Studies on Jacobi-Davidson, Rayleigh quotient iteration, inverse iteration, generalized Davidson and Newton updates}, Numer. Linear Algebra Appl., 13 (2006), pp. 621-642. · Zbl 1174.65372 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.