×

zbMATH — the first resource for mathematics

Towards an efficient use of the BLAS library for multilinear tensor contractions. (English) Zbl 1336.65076
Summary: Mathematical operators whose transformation rules constitute the building blocks of a multi-linear algebra are widely used in physics and engineering applications where they are very often represented as tensors. In the last century, thanks to the advances in tensor calculus, it was possible to uncover new research fields and make remarkable progress in the existing ones, from electromagnetism to the dynamics of fluids and from the mechanics of rigid bodies to quantum mechanics of many atoms. By now, the formal mathematical and geometrical properties of tensors are well defined and understood; conversely, in the context of scientific and high-performance computing, many tensor-related problems are still open. In this paper, we address the problem of efficiently computing contractions among two tensors of arbitrary dimension by using kernels from the highly optimized BLAS library. In particular, we establish precise conditions to determine if and when GEMM, the kernel for matrix products, can be used. Such conditions take into consideration both the nature of the operation and the storage scheme of the tensors, and induce a classification of the contractions into three groups. For each group, we provide a recipe to guide the users towards the most effective use of BLAS.

MSC:
65F99 Numerical linear algebra
15A69 Multilinear algebra, tensor calculus
65Y15 Packaged methods for numerical algorithms
65-04 Software, source code, etc. for problems pertaining to numerical analysis
Software:
MKL; ATLAS; BLIS ; BLAS; GitHub; CTF
PDF BibTeX XML Cite
Full Text: DOI Link
References:
[1] Helgaker, T.; Jorgensen, P.; Olsen, J., Molecular electronic-structure theory, (2000), John Wiley & Sons
[2] S.M. Carroll, Lecture Notes on General Relativity, arXiv:gr-qc/9712019.
[3] ACES III - <> NWChem - <_>.
[4] Kolda, T. G.; Bader, B. W., Tensor decompositions and applications, SIAM Rev., 3, 455, (2009) · Zbl 1173.65029
[5] Oseledets, I.; Tyrtyshnikov, E., TT-cross approximation for multidimensional arrays, Linear Algebra Appl., 432, 1, 70-88, (2010) · Zbl 1183.65040
[6] Uschmajew, A., Local convergence of the alternating least squares algorithm for canonical tensor approximation, SIAM. J. Matrix Anal. Appl., 33, 2, 639-652, (2012) · Zbl 1252.65085
[7] Lawson, C. L.; Hanson, R. J.; Kincaid, R. J.; Krogh, F. T., Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Softw., 5, 308-323, (1979) · Zbl 0412.65022
[8] Dongarra, J. J.; Croz, J. D.; Hammarling, S.; Hanson, R. J., An extended set of Fortran basic linear algebra subprograms, ACM Trans. Math. Softw., 14, 1-17, (1988) · Zbl 0639.65016
[9] Dongarra, J.; Croz, J. D.; Duff, I.; Hammarling, S., A set of level 3 basic linear algebra subprograms, ACM Trans. Math. Softw., 16, 1-17, (1990) · Zbl 0900.65115
[10] Dodson, D. S.; Lewis, J. G., Issues relating to extension of the basic linear algebra subprograms, ACM SIGNUM Newslett., 20, 1, 19-22, (1985)
[11] Goto, K.; van de Geijn, R. A., Anatomy of high-performance matrix multiplication, ACM Trans. Math. Softw., 34, 3, 12:1-12:25, (2008), Article 12 · Zbl 1190.65064
[12] Goto, K.; van de Geijn, R. A., High-performance implementation of the level-3 BLAS, ACM Trans. Math. Softw., 35, 1, 4:1-4:14, (2008), Article 4
[13] Intel Math Kernel Library, <http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation>.
[14] AMD Core Math Library, <http://developer.amd.com/tools/cpu-development/amd-core-math-library-acml/user-guide/>.
[15] IBM Engineering and Scientific Subroutine Library, <http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic/com.ibm.cluster.essl.doc/esslbooks.html>.
[16] <>.
[17] E. Solomonik, J. Hammond, J. Demmel, A preliminary analysis of cyclops tensor framework, Berkeley, Tech. Rep. UCB/EECS-2012-29.
[18] E. Solomonik, D. Matthews, J. Hammond, J. Demmel, Cyclops tensor framework: reducing communication and eliminating load imbalance in massively parallel contractions, in: Proceedings of the 27th International Symposium on Parallel and Distributed Processing (IPDPS ’13), pp. 813-824.
[19] G. Baumgartner, D.E. Bernholdt, V. Choppella, J. Ramanujam, P. Sadayappan, A high-level approach to synthesis of high-performance codes for quantum chemistry: the tensor contraction engine, in: Proceedings of the 11th Workshop on Compilers for Parallel Computers (CPC ’04), vol. 7-9, Chiemsee, Germany, 2004, pp. 281-290.
[20] Baumgartner, G.; Auer, A.; Bernholdt, D. E.; Bibireata, A.; Choppella, V.; Cociorva, D.; Gao, X.; Harrison, R. J.; Hirata, S.; Krishnamoorthy, S.; Krishnan, S.; Lam, C.; Lu, Q.; Nooijen, M.; Pitzer, R. M.; Ramanujam, J.; Sadayappan, P.; Sibiryakov, A., Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models, Proc. IEEE, 93, 2, 276-292, (2005)
[21] Hartono, A.; Sibiryakov, A.; Nooijen, M.; Baumgartner, G.; Bernholdt, D. E.; Hirata, S.; Lam, C.; Pitzer, R. M.; Ramanujam, J.; Sadayappan, P., Automated operation minimization of tensor contraction expressions in electronic structure calculations, (Proceedings of the International Conference on Computational Science (ICCS ’05), Part I. Lecture Notes in Computer Science, 3514, (2005), Springer-Verlag), 155-164 · Zbl 1129.68590
[22] C̆íz̆ek, J., On the correlation problem in atomic and molecular systems. calculation of wavefunction components in ursell-type expansion using quantum-field theoretical methods, J. Chem. Phys., 45, 4256-4266, (1966)
[23] Bartlett, R. J.; Purvis, G. D., Many-body perturbation theory, coupled-pair many-electron theory, and the importance of quadruple excitations for the correlation problem, Int. J. Quantum Chem., 14, 5, 561-581, (1978)
[24] Lehner, L., Numerical relativity: a review class, Quantum Gravity, 18, 106072, R25-R86, (2001) · Zbl 0987.83001
[25] R.C. Whaley, J. Dongarra, Automatically tuned linear algebra software, in: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (Supercomputing ’98).
[26] R.C. Whaley, Automated empirical optimization of high performance floating point kernels (Ph.D. Dissertation), Florida State University, Tallahassee, FL, USA, 2004.
[27] F.G. Van Zee, R.A. van de Geijn, BLIS: A framework for rapidly instantiating BLAS functionality, ACM Trans. Math. Softw., 2013, accepted for publication. · Zbl 1347.65054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.