×

Solving a large scale radiosity problem on GPU-based parallel computers. (English) Zbl 1336.65025

Summary: The radiosity equation has been used widely in computer graphics and thermal engineering applications. The equation is simple to formulate but is challenging to solve when the number of Lambertian surfaces associated with an application becomes large. In this paper, we present the algorithms to compute the view factors and solve the set of radiosity equations using an out-of-core Cholesky decomposition method. This work details the algorithmic procedures of the computation of the view factors and the Cholesky solver. The data layout of the radiosity matrix follows the block cyclic decomposition scheme used in ScaLAPACK. The parallel computation of the view factors on the GPUs extends the algorithms based on a serial community code called view3d. To handle large matrices that exceed the device memory on GPU, an out-of-core algorithm for parallel Cholesky factorization is implemented. A performance study conducted on Keeneland, a hybrid CPU/GPU cluster at the National Institute for Computational Sciences, composed of 264 nodes of multicore CPU and GPU are shown and discussed.

MSC:

65F05 Direct numerical methods for linear systems and matrix inversion
65M22 Numerical solution of discretized equations for initial value and initial-boundary value problems involving PDEs
65Y05 Parallel numerical computation
65Y10 Numerical algorithms for specific classes of architectures
PDFBibTeX XMLCite
Full Text: DOI

References:

[2] Walton, G. N., Algorithms for calculating radiation view factors between plane convex polygons with obstructions, (Tech. Rep. NBSIR 86-3463, 1987—Shortened Report in Fundamentals and Applications of Radiation Heat Transfer, HTD-Vol. 72 (1986), National Bureau of Standards, American Society of Mechanical Engineers)
[3] Agullo, E.; Demmel, J.; Dongarra, J.; Hadri, B.; Kurzak, J.; Langou, J.; Ltaief, H.; Luszczek, P.; Tomov, S., Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects, J. Phys. Conf. Ser., 180, 012037 (2009)
[9] Nath, R.; Tomov, S.; Dongarra, J., An improved MAGMA GEMM for Fermi GPUs, Int. J. High Perform. Comput. Appl., 24, 4, 511-515 (2010)
[10] Song, F.; Dongarra, J., A scalable framework for heterogeneous GPU-based clusters, (Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures (2012), ACM), 91-100
[11] Song, F.; Tomov, S.; Dongarra, J., Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems, (Proceedings of the 26th ACM International Conference on Supercomputing (2012), ACM), 365-376
[12] D’Azevedo, E.; Hill, J. C., Parallel LU factorization on GPU cluster, Proc. Comput. Sci., 9, 67-75 (2012)
[13] Barrett, R. F.; Chan, T. H.F.; D’Azevedo, E. F.; Jaeger, E. F.; Wong, K.; Wong, R. Y., Complex version of high performance computing LINPACK benchmark (HPL), Concurr. Comput.: Pract. Exp., 22, 5, 537-587 (2010)
[14] Bach, M.; Kretz, M.; Lindenstruth, V.; Rohr, D., Optimized HPL for amd GPU and multi-core CPU usage, Comput. Sci.-Res. Dev., 22, 5, 537-587 (2010)
[16] Nath, R.; Tomov, S.; Dongarra, J., An improved MAGMA GEMM for Fermi graphics processing units, Int. J. High Perform. Comput. Appl., 24, 4, 511-515 (2010)
[17] Ohmura, J.; Miyoshi, T.; Hidetsugu, I.; Yoshinaga, T., Computation- communication overlap of linpack on a GPU-accelerated PC cluster, IIEICE Trans. Inf. Syst., 94, 12, 2319-2327 (2011)
[18] Volkov, V.; Demmel, J., LU, QR and Cholesky Factorizations Using Vector Capabilities of GPUs, Tech. Rep. UCB/EECS-2008-49 (2008), University of California: University of California Berkeley, CA
[19] Rohr, D.; Bach, M.; Kretz, M.; Lindenstruth, V., Multi-GPU DGEMM and HPL on highly energy efficient clusters, IEEE Micro, 99, 1 (2011)
[20] Wang, F.; Yang, C. Q.; Du, Y. F.; Chen, J.; Yi, H. Z.; Xu, W. X., Optimizing LINPACK benchmark on GPU-accelerated petascale supercomputer, J. Comput. Sci. Tech., 26, 5, 854-865 (2011)
[22] Walton, G. N., Calculation of Obstructed View Factors by Adaptive Integration, Tech. Rep. NISTIR 6925 (2002), National Institute of Standards and Technology: National Institute of Standards and Technology Gaithersburg, MD
[23] Hottel, H. C.; Sarofim, A., Radiative Transfer (1967), McGraw-Hill: McGraw-Hill New York, NY
[24] Walton, G. N., Fortran IV Programs to Calculate Radiant Interchange Factors, Tech. Rep. BDR-25 (1966), National Research Council of Canada, Division of Building Research: National Research Council of Canada, Division of Building Research Ottawa, Canada
[25] Yamazaki, I.; Tomov, S.; Dongarra, J., One-sided dense matrix factorizations on a multicore with multiple GPU accelerators, Proc. Comput. Sci., 9, 37-46 (2012)
[27] Dongarra, J.; Hammarling, S.; Walker, D., Key concepts for parallel out of core LU factorization, Parallel Comput., 23, 49-70 (1997) · Zbl 0906.68036
[28] D’Azevedo, E.; Dongarra, J., The design and implementation of the parallel out-of-core scalapack LU, QR, and Cholesky factorization routines, Concurr. Comput.: Pract. Exp., 12, 1481-1493 (2000) · Zbl 1008.68577
[29] Gunter, B.; Reiley, W.; van de Geijn, R., Implementation of Out-of-Core Cholesky and QR Factorizations with POOCLAPACK, Tech. Rep. CS-TR-00-21 (2000), University of Texas at Austin: University of Texas at Austin Austin, TX, USA
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.