×

Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures. (English) Zbl 1323.65022

Palma, José M. Laginha M. (ed.) et al., High performance computing for computational science – VECPAR 2010. 9th international conference, Berkeley, CA, USA, June 22–25, 2010. Revised selected papers. Berlin: Springer (ISBN 978-3-642-19327-9/pbk). Lecture Notes in Computer Science 6449, 129-138 (2011).
Summary: The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, a QR factorization, and computing the inverse of a symmetric positive definite matrix. In this extended abstract, we revisit the computation of the inverse of a symmetric positive definite matrix. We observe that, using a dynamic task scheduler, it is relatively painless to translate existing LAPACK code to obtain a ready-to-be-executed tile algorithm. However we demonstrate that, for some variants, non trivial compiler techniques (array renaming, loop reversal and pipelining) need then to be applied to further increase the parallelism of the application. We present preliminary experimental results.
For the entire collection see [Zbl 1207.68016].

MSC:

65F05 Direct numerical methods for linear systems and matrix inversion
65Y05 Parallel numerical computation
65Y10 Numerical algorithms for specific classes of architectures
15A09 Theory of matrix inversion and generalized inverses
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] BLAS: Basic linear algebra subprograms, http://www.netlib.org/blas/ · Zbl 0614.65039
[2] Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H.: PLASMA Users’ Guide. Technical report, ICL, UTK (2009)
[3] Agullo, E., Hadri, B., Ltaief, H., Dongarrra, J.: Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In: SC 2009: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–12. ACM, New York (2009) · doi:10.1145/1654059.1654080
[4] Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco (2001)
[5] Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J.W., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992) · Zbl 0934.65030
[6] Bientinesi, P., Gunter, B., van de Geijn, R.: Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw. 35(1), 1–22 (2008) · Zbl 05458507 · doi:10.1145/1377603.1377606
[7] Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997) · Zbl 0886.65022 · doi:10.1137/1.9780898719642
[8] Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: Parallel tiled QR factorization for multicore architectures. Concurrency Computat.: Pract. Exper. 20(13), 1573–1590 (2008) · Zbl 05616255 · doi:10.1002/cpe.1301
[9] Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing 35(1), 38–53 (2009) · doi:10.1016/j.parco.2008.10.002
[10] Chan, E.: Runtime data flow scheduling of matrix computations. FLAME Working Note #39. Technical Report TR-09-22, The University of Texas at Austin, Department of Computer Sciences (August 2009)
[11] Chan, E., Van Zee, F.G., Bientinesi, P., Quintana-Ortí, E.S., Quintana-Ortí, G., van de Geijn, R.: Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 123–132. ACM, New York (2008)
[12] Christofides, N.: Graph Theory: An algorithmic Approach (1975) · Zbl 0321.94011
[13] Du Croz, J.J., Higham, N.J.: Stability of methods for matrix inversion. IMA Journal of Numerical Analysis 12, 1–19 (1992) · Zbl 0748.65021 · doi:10.1093/imanum/12.1.1
[14] Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks®. IEEE Trans. Parallel Distrib. Syst. 9(1), 5–23 (1998) · doi:10.1109/71.655238
[15] Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2002) · Zbl 1011.65010 · doi:10.1137/1.9780898718027
[16] Kurzak, J., Dongarra, J.: Fully dynamic scheduler for numerical computing on multicore processors. University of Tennessee CS Tech. Report, UT-CS-09-643 (2009)
[17] Kurzak, J., Dongarra, J.: QR factorization for the Cell Broadband Engine. Sci. Program. 17(1-2), 31–42 (2009)
[18] Perez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of IEEE Cluster Computing 2008 (2008) · doi:10.1109/CLUSTR.2008.4663765
[19] Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Transactions on Mathematical Software 36(3) · Zbl 1364.65105 · doi:10.1145/1527286.1527288
[20] Rinard, M.C., Scales, D.J., Lam, M.S.: Jade: A high-level, machine-independent language for parallel programming. Computer 6, 28–38 (1993) · Zbl 05089212 · doi:10.1109/2.214440
[21] Sutter, H.: A fundamental turn toward concurrency in software. Dr. Dobb’s Journal 30(3) (2005)
[22] Wolfe, M.: Doany: Not just another parallel loop. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1992. LNCS, vol. 757, pp. 421–433. Springer, Heidelberg (1993) · doi:10.1007/3-540-57502-2_62
[23] Van Zee, F.G.: libflame: The Complete Reference (2009), http://www.lulu.com
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.