Clark, M. A.; Babich, R.; Barros, K.; Brower, R. C.; Rebbi, C. Solving lattice QCD systems of equations using mixed precision solvers on GPUs. (English) Zbl 1215.81124 Comput. Phys. Commun. 181, No. 9, 1517-1528 (2010). Summary: Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodynamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA’s CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40, 135 and 212 Gflops for double, single and half precision respectively on NVIDIA’s GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision. Cited in 10 Documents MSC: 81V05 Strong interaction, including quantum chromodynamics 81T25 Quantum field theory on lattices 81T80 Simulation and numerical modelling (quantum field theory) (MSC2010) Keywords:CUDA; GPGPU; GPU; lattice QCD; mixed precision Software:QUDA; Chroma; OpenCL; CUDA; BiCGstab PDFBibTeX XMLCite \textit{M. A. Clark} et al., Comput. Phys. Commun. 181, No. 9, 1517--1528 (2010; Zbl 1215.81124) Full Text: DOI arXiv References: [1] Egri, G. I.; Fodor, Z.; Hoelbling, C.; Katz, S. D.; Nogradi, D.; Szabo, K. K., Lattice QCD as a video game, Comput. Phys. Comm., 177, 631 (2007) [2] NVIDIA Corporation, NVIDIA CUDA Programming Guide (2009) [3] Barros, K.; Babich, R.; Brower, R.; Clark, M. A.; Rebbi, C., Blasting through lattice calculations using CUDA, LATTICE2008, PoS, 045 (2008) [4] D. Göddeke, R. Strzodka, S. Turek, Accelerating double precision FEM simulations with GPUs, in: Proceedings of ASIM 2005 - 18th Symposium on Simulation Technique, 2005; D. Göddeke, R. Strzodka, S. Turek, Accelerating double precision FEM simulations with GPUs, in: Proceedings of ASIM 2005 - 18th Symposium on Simulation Technique, 2005 [5] Sleijpen, G. L.G.; van der Vorst, H. A., Reliable updated residuals in hybrid Bi-CG methods, Computing, 56, 141-164 (1996) · Zbl 0842.65018 [6] A. Munshi, et al., The OpenCL specification version 1.0, Technical report, Khronos OpenCL Working Group, 2009.04.02, 2009; A. Munshi, et al., The OpenCL specification version 1.0, Technical report, Khronos OpenCL Working Group, 2009.04.02, 2009 [7] DeGrand, T. A.; Rossi, P., Comput. Phys. Comm., 60, 211 (1990) [8] N. Bell, M. Garland, Efficient sparse matrix-vector multiplication on CUDA, NVIDIA Technical Report NVR-2008-004, 2008; N. Bell, M. Garland, Efficient sparse matrix-vector multiplication on CUDA, NVIDIA Technical Report NVR-2008-004, 2008 [10] De Forcrand, P.; Lellouch, D.; Roiesnel, C., Optimizing a lattice QCD simulation program, J. Comput. Phys., 59, 324 (1985) · Zbl 0591.65002 [11] G. Ruetsch, P. Micikevicius, Optimizing matrix transpose in CUDA, NVIDIA Technical Report, 2009; G. Ruetsch, P. Micikevicius, Optimizing matrix transpose in CUDA, NVIDIA Technical Report, 2009 [12] Holmgren, D., Fermilab Status (2009) [13] Kahan, W., Further remarks on reducing truncation errors, Comm. ACM, 8, 40 (1965) [14] M. Harris, Optimizing parallel reduction in CUDA, presentation packaged with CUDA Toolkit, NVIDIA Corporation, 2007; M. Harris, Optimizing parallel reduction in CUDA, presentation packaged with CUDA Toolkit, NVIDIA Corporation, 2007 [15] Martin, R. S.; Peters, G.; Wilkinson, J. H., Handbook series linear algebra: Iterative refinement of the solution of a positive definite system of equations, Numer. Math., 8, 203-216 (1966) · Zbl 0158.33804 [16] R. Strzodka, D. Göddeke, Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components, in: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), April 2006, pp. 259-268; R. Strzodka, D. Göddeke, Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components, in: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), April 2006, pp. 259-268 [17] Bulava, J. M., Phys. Rev. D, 79, 034505 (2009) [19] Edwards, R. G.; Joo, B., The Chroma software system for lattice QCD, Nucl. Phys. B Proc. Suppl., 140, 832 (2005) [23] Brannick, J.; Brower, R. C.; Clark, M. A.; Osborn, J. C.; Rebbi, C., Adaptive multigrid algorithm for lattice QCD, Phys. Rev. Lett., 100, 041601 (2008) [24] Clark, M. A.; Brannick, J.; Brower, R. C.; McCormick, S. F.; Manteuffel, T. A.; Osborn, J. C.; Rebbi, C., The removal of critical slowing down, LATTICE2008, PoS, 035 (2008) [25] Bunk, B.; Sommer, R., An eight parameter representation of \(SU(3)\) matrices and its application for simulating lattice QCD, Comput. Phys. Comm., 40, 229 (1986) This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.