×

SU(2) lattice gauge theory simulations on Fermi GPUs. (English) Zbl 1216.81104

Summary: In this work we explore the performance of CUDA in quenched lattice SU(2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model.
We produce codes for the Monte Carlo generation of SU(2) lattice gauge configurations, for the mean plaquette, for the Polyakov loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of \(200\times \) the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than \(2\times \) slower) than single precision computations.

MSC:

81T13 Yang-Mills and other gauge theories in quantum field theory
81T25 Quantum field theory on lattices
81T80 Simulation and numerical modelling (quantum field theory) (MSC2010)
65C05 Monte Carlo methods
81V05 Strong interaction, including quantum chromodynamics
81V35 Nuclear physics

Software:

cuRAND; CUDA
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Kirk, D. B.; Hwu, W.-M. W., Programming Massively Parallel Processors: A Hands-On Approach (2010), Morgan Kaufmann
[2] NVIDIA, NVIDIA CUDA™ Programming Guide, 3rd Edition (2010).; NVIDIA, NVIDIA CUDA™ Programming Guide, 3rd Edition (2010).
[3] Egri, G. I., Lattice QCD as a video game, Comput. Phys. Commun., 177, 631-639 (2007), arXiv:hep-lat/0611022
[4] Clark, M. A.; Babich, R.; Barros, K.; Brower, R. C.; Rebbi, C., Comput. Phys. Commun., 181, 1517 (2010), arXiv:0911.3191 [hep-lat]
[5] T.W. Chiu, T.H. Hsieh, Y.Y. Mao, K. Ogawa [TWQCD Collaboration and TWQCD Collaboration and TWQCD Collaboration an], PoS LATTICE2010, 030 (2010) [arXiv:1101.0423 [hep-lat]].; T.W. Chiu, T.H. Hsieh, Y.Y. Mao, K. Ogawa [TWQCD Collaboration and TWQCD Collaboration and TWQCD Collaboration an], PoS LATTICE2010, 030 (2010) [arXiv:1101.0423 [hep-lat]].
[6] M. Hayakawa, K.I. Ishikawa, Y. Osaki, S. Takeda, S. Uno, N. Yamada, arXiv:1009.5169 [hep-lat].; M. Hayakawa, K.I. Ishikawa, Y. Osaki, S. Takeda, S. Uno, N. Yamada, arXiv:1009.5169 [hep-lat].
[7] Creutz, M., Confinement and lattice gauge theory, Phys. Scripta, 23, 973 (1981)
[8] Creutz, M., Monte Carlo study of quantized SU(2) gauge theory, Phys. Rev., D21, 2308-2315 (1980)
[9] McLerran, L. D.; Svetitsky, B., A Monte Carlo study of SU(2) Yang-Mills theory at finite temperature, Phys. Lett., B98, 195 (1981), 10.1016/0370-269(81)90986-2
[10] Engels, J., The polyakov loop near deconfinement in SU (2) gauge theory, Nucl. Phys. B - Proc. Supplements, 4, 289-293 (1988), doi:DOI: 10.1016/0920-5632(88)90115-6. URL <http://www.sciencedirect.com/science/article/B6TVD-47GJ1GK-2X/2/e13d97c0f93d9ede9f4dda9b571d1e24>
[11] NVIDIA, NVIDIA’s Next Generation CUDA Compute Architecture: Fermi (2010).; NVIDIA, NVIDIA’s Next Generation CUDA Compute Architecture: Fermi (2010).
[12] Press, W.; Teukolsky, S.; Vetterling, W.; Flannery, B., Numerical Recipes in C (1992), Cambridge University Press: Cambridge University Press Cambridge, UK
[13] NVIDIA, CUDA, CURAND Library, 1st Edition (2010).; NVIDIA, CUDA, CURAND Library, 1st Edition (2010).
[14] NVIDIA. [link]. URL <http://www.nvidia.com/object/cuda_home_new.html>; NVIDIA. [link]. URL <http://www.nvidia.com/object/cuda_home_new.html>
[15] M. Harris, Optimizing Parallel Reduction in CUDA, NVIDIA Developer Technology, NVIDIA GPU computing SDK 3.2 Edition (2010).; M. Harris, Optimizing Parallel Reduction in CUDA, NVIDIA Developer Technology, NVIDIA GPU computing SDK 3.2 Edition (2010).
[16] Portuguese Lattice QCD collaboration, <http://nemea.ist.utl.pt/∼;ptqcd/>; Portuguese Lattice QCD collaboration, <http://nemea.ist.utl.pt/∼;ptqcd/>
[17] Bhanot, G.; Rebbi, C., SU(2) string tension, Glueball Mass and Interquark potential by Monte Carlo computations, Nucl. Phys., B180, 469 (1981), doi:10.1016/0550-3213(81)90063-8
[18] Kovacs, E., A Monte Carlo evaluation of the Interquark potential, Phys. Rev., D25, 3312 (1982)
[19] Stack, J. D., The heavy quark potential in SU(2) lattice gauge theory, Phys. Rev., D27, 412 (1983)
[20] Huntley, A.; Michael, C., Static potentials and scaling in SU (2) lattice gauge theory, Nucl. Phys. B, 270, 123-134 (1986), doi:10.1016/0550-3213(86)90548-1. URL <http://www.sciencedirect.com/science/article/B6TVC-473FRXG-2T/2/4afbb10609bcc0b73d10f96c7dbdd214>
[21] Shakespeare, N. H.; Trottier, H. D., Tadpole-improved SU (2) lattice gauge theory, Phys. Rev., D59, 014502 (1999), arXiv:hep-lat/9803024
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.