zbMATH — the first resource for mathematics

Reproducibility strategies for parallel preconditioned conjugate gradient. (English) Zbl 1433.65368
Summary: The Preconditioned Conjugate Gradient method is often used in numerical simulations. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we aim at a twofold goal: enhance the accuracy of the solver but also ensure its reproducibility in a message-passing implementation. We design and employ various strategies starting from the ExBLAS approach (through preserving every bit of information until final rounding) to its more lightweight performance-oriented variant (through expanding the intermediate precision). These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Finally, we verify these strategies on modern HPC systems: both versions deliver reproducible number of iterations, residuals, direct errors, and vector-solutions for the overhead of only 29% (ExBLAS) and 4% (lightweight) on 768 processes.
65Y05 Parallel numerical computation
65Y20 Complexity and performance of numerical algorithms
68Q10 Modes of computation (nondeterministic, parallel, interactive, probabilistic, etc.)
BLAS; ExBLAS; HPL; LBNL; mctoolbox; MPFR
Full Text: DOI
[1] Barrett, R.; Berry, M.; Chan, T. F.; Demmel, J.; Donato, J.; Dongarra, J.; Eijkhout, V.; Pozo, R.; Romine, C.; der Vorst, H. V., Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods (1994), SIAM
[2] Saad, Y., Iterative Methods for Sparse Linear Systems (2003), SIAM: SIAM Philadelphia, PA, USA · Zbl 1002.65042
[3] Gropp, W.; Hoefler, T.; Thakur, R.; Lusk, E., Using Advanced MPI: Modern Features of the Message-Passing Interface (2014), MIT Press
[4] Lawson, C. L.; Hanson, R. J.; Kincaid, D. R.; Krogh, F. T., Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Software, 5, 3, 308-323 (1979) · Zbl 0412.65022
[5] Dongarra, J. J.; Croz, J. D.; Hammarling, S.; Duff, I., A set of level 3 basic linear algebra subprograms, ACM Trans. Math. Software, 16, 1, 1-17 (1990) · Zbl 0900.65115
[6] Collange, S.; Defour, D.; Graillat, S.; Iakymchuk, R., Numerical reproducibility for the parallel reduction on multi- and many-core architectures, ParCo, 49, 83-97 (2015)
[7] Demmel, J.; Nguyen, H. D., Parallel reproducible summation, IEEE Trans. Comput., 64, 7, 2060-2070 (2015) · Zbl 1360.68042
[8] Iakymchuk, R.; Graillat, S.; Defour, D.; Quintana-Orti, E. S., Hierarchical approach for deriving a reproducible LU factorization, Int. J. High Perform. Comput. Appl., 33, 5, 791-803 (2019)
[9] Iakymchuk, R.; Defour, D.; Collange, S.; Graillat, S., Reproducible and accurate matrix multiplication, Lect. Notes Comput. Sci., 9553, 126-137 (2016) · Zbl 1354.65082
[10] Higham, N. J., Accuracy and Stability of Numerical Algorithms, 680 (2002), SIAM: SIAM Philadelphia, PA · Zbl 1011.65010
[11] Muller, J.-M.; Brisebarre, N.; de Dinechin, F.; Jeannerod, C.-P.; Lefèvre, V.; Melquiond, G.; Revol, N.; Stehlé, D.; Torres, S., Handbook of Floating-Point Arithmetic, 572 (2010), Birkhäuser · Zbl 1197.65001
[12] Rump, S. M.; Ogita, T.; Oishi, S., Accurate floating-point summation part II: Sign, K-fold faithful and rounding to nearest, SIAM J. Sci. Comput., 31, 2, 1269-1302 (2008) · Zbl 1190.65074
[13] Rump, S. M., Computer-assisted proofs and self-validating methods, (Einarsson, B., Handbook on Accuracy and Reliability in Scientific Computation (2005), SIAM), 195-240
[14] Bailey, D. H., High-precision computation: applications and challenges, (Proceedings of ARITH-21 (2013), IEEE), 1, Keynote talk
[15] Lutz, D. R.; Hinds, C. N., High-precision anchored accumulators for reproducible floating-point summation, (Proceedings of ARITH-24 (2017), IEEE: IEEE London, UK), 98-105
[16] Burgess, N.; Goodyer, C.; Lutz, D. R.; Hinds, C. N., High-precision anchored accumulators for reproducible floating-point summation, IEEE Trans. Comput., 68, 7, 967-978 (2019) · Zbl 07093733
[17] IEEE Computer Society, N., IEEE Standard for Floating-Point Arithmetic (2008), IEEE Standard 754-2008
[18] D. Mukunoki, T. Ogita, K. Ozaki, Accurate and reproducible BLAS routines with Ozaki scheme for many-core architectures, in: Proc. International Conference on Parallel Processing and Applied Mathematics, PPAM2019, 2019, accepted.
[19] Knuth, D. E., The Art of Computer Programming: Seminumerical Algorithms, volume 2 (1969), Addison-Wesley · Zbl 0191.18001
[20] Ogita, T.; Rump, S. M.; Oishi, S., Accurate sum and dot product, SIAM J. Sci. Comput., 26, 1955-1988 (2005) · Zbl 1084.65041
[21] Kulisch, U.; Snyder, V., The exact dot product as basic tool for long interval arithmetic, Computing, 91, 3, 307-313 (2011) · Zbl 1228.65073
[22] Iakymchuk, R.; Collange, S.; Defour, D.; Graillat, S., ExBLAS: Reproducible and accurate BLAS library, (Proceedings of the NRE2015 Workshop Held as Part of SC15. Austin, TX, USA, November 15-20, 2015 (2015), LIP6, ICS, INRIA, DALI-LIRMM)
[23] Hida, Y.; Li, X. S.; Bailey, D. H., Algorithms for quad-double precision floating point arithmetic, (Proceedings of ARITH-15 (2001)), 155-162
[24] Boldo, S.; Melquiond, G., Emulation of a FMA and correctly rounded sums: Proved algorithms using rounding to odd, IEEE Trans. Comput., 57, 4, 462-471 (2008) · Zbl 1388.65200
[25] Priest, D. M., Algorithms for arbitrary precision floating point arithmetic, (10th IEEE Symposium on Computer Arithmetic (1991), IEEE), 132-143
[26] Wiesenberger, M.; Einkemmer, L.; Held, M.; Gutierrez-Milla, A.; Xavier Saez, X.; Iakymchuk, R., Reproducibility, accuracy and performance of the feltor code and library on parallel computer architectures, Comput. Phys. Comm., 238, 145-156 (2019)
[27] HPCG - High performance conjugate gradients (2015)
[28] Dongarra, J.; Heroux, M. A., Toward a New Metric for Ranking High Performance Computing Systems (2013), Sandia National Lab., Sandia Report SAND2013-4744
[29] Petitet, A.; Whaley, R. C.; Dongarra, J.; Cleary, A., HPL - A portable implementation of the high-performance Linpack benchmark for distributed-memory computers (2008)
[30] Fousse, L.; Hanrot, G.; Lefèvre, V.; Pélissier, P.; Zimmermann, P., MPFR: A multiple-precision binary floating-point library with correct rounding, ACM Trans. Math. Software, 33, 2, 13 (2007) · Zbl 1365.65302
[31] Rump, S. M.; Ogita, T.; Oishi, S., Fast high precision summation, Nonlinear Theory Appl., 1, 1, 2-24 (2010)
[32] J. Demmel, H.D. Nguyen, Fast reproducible floating-point summation, in: Proceedings of ARITH-21, 2013, pp. 163-172.
[33] Nguyen, H. D.; Demmel, J., Reproducible tall-skinny QR, (Proceedings of ARITH-22 (2015)), 152-159
[34] Ozaki, K.; Ogita, T.; Oishi, S.; Rump, S. M., Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications, Numer. Algorithms, 59, 1, 95-118 (2012) · Zbl 1244.65062
[35] Carson, E.; Higham, N. J., Accelerating the solution of linear systems by iterative refinement in three precisions, SIAM J. Sci. Comput., 40, 2, A817-A847 (2018) · Zbl 1453.65067
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.