Software for sparse tensor decomposition on emerging computing architectures. (English) Zbl 07099291


65Y10 Numerical algorithms for specific classes of architectures
15A72 Vector and tensor algebra, theory of invariants
15-04 Software, source code, etc. for problems pertaining to linear algebra
Full Text: DOI arXiv


[1] B. W. Bader and T. G. Kolda, Algorithm 862: MATLAB tensor classes for fast algorithm prototyping, ACM Trans. Math. Software, 32 (2006), pp. 635–653, https://doi.org/10.1145/1186785.1186794. · Zbl 1230.65054
[2] B. W. Bader and T. G. Kolda, Efficient MATLAB computations with sparse and factored tensors, SIAM J. Sci. Comput., 30 (2007), pp. 205–231, https://doi.org/10.1137/060676489. · Zbl 1159.65323
[3] N. Bell and M. Garland, Implementing sparse matrix-vector multiplication on throughput-oriented processors, in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, ACM, New York, 2009, 18, https://doi.org/10.1145/1654059.1654078.
[4] J. D. Carroll and J. J. Chang, Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition, Psychometrika, 35 (1970), pp. 283–319, https://doi.org/10.1007/BF02310791. · Zbl 0202.19101
[5] J. H. Choi and S. Vishwanathan, DFacTo: Distributed factorization of tensors, in Advances in Neural Information Processing Systems (NIPS 2014), 2014, pp. 1296–1304, http://papers.nips.cc/paper/5395-dfacto-distributed-factorization-of-tensors.
[6] N. Cohen, O. Sharir, and A. Shashua, On the expressive power of deep learning: A tensor analysis, in 29th Annual Conference on Learning Theory, Vol. 49, Proceedings of Machine Learning Research, 2016, pp. 698–728, http://proceedings.mlr.press/v49/cohen16.html.
[7] H. C. Edwards, D. Sunderland, V. Porter, C. Amsler, and S. Mish, Manycore performance-portability: Kokkos multidimensional array library, Sci. Program., 20 (2012), pp. 89–114, https://doi.org/10.3233/SPR-2012-0343.
[8] H. C. Edwards, C. R. Trott, and D. Sunderland, Kokkos: Enabling manycore performance portability through polymorphic memory access patterns, J. Parallel Distrib. Comput., 74 (2014), pp. 3202–3216, https://doi.org/10.1016/j.jpdc.2014.07.003.
[9] H. Fanaee-T and J. Gama, Tensor-based anomaly detection: An interdisciplinary survey, Knowledge-Based Systems, 98 (2016), pp. 130–147, https://doi.org/10.1016/j.knosys.2016.01.027.
[10] E. Gujral, R. Pasricha, and E. E. Papalexakis, SamBaTen: Sampling-based Batch Incremental Tensor Decomposition, preprint, https://arxiv.org/abs/1709.00668v1, 2017.
[11] R. A. Harshman, Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis, UCLA Working Papers in Phonetics, 16 (1970), pp. 1–84; available at http://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf.
[12] D. Hong, T. G. Kolda, and J. A. Duersch, Generalized Canonical Polyadic Tensor Decomposition, preprint, https://arxiv.org/abs/1808.07452, 2018.
[13] M. Janzamin, H. Sedghi, and A. Anandkumar, Beating the Perils of Non-convexity: Guaranteed Training of Neural Networks using Tensor Methods, preprint, https://arxiv.org/abs/1506.08473v3, 2016.
[14] P. Karpiński and J. McDonald, A high-performance portable abstract interface for explicit SIMD vectorization, in Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM’17, ACM, New York, 2017, pp. 21–28, https://doi.org/10.1145/3026937.3026939.
[15] O. Kaya and B. Uçar, Parallel CANDECOMP/PARAFAC decomposition of sparse tensors using dimension trees, SIAM J. Sci. Comput., 40 (2018), pp. C99–C130, https://doi.org/10.1137/16M1102744.
[16] K. Kim, T. B. Costa, M. Deveci, A. M. Bradley, S. D. Hammond, M. E. Guney, S. Knepper, S. Story, and S. Rajamanickam, Designing vector-friendly compact BLAS and LAPACK kernels, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’17, ACM, New York, 2017, 55, https://doi.org/10.1145/3126908.3126941.
[17] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455–500, https://doi.org/10.1137/07070111X.
[18] M. Kretz and V. Lindenstruth, Vc: A C++ library for explicit vectorization, Software: Practice and Experience, 42 (2012), pp. 1409–1430, https://doi.org/10.1002/spe.1149.
[19] J. Li, J. Choi, I. Perros, J. Sun, and R. Vuduc, Model-driven sparse CP decomposition for higher-order tensors, in Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2017, pp. 1048–1057, https://doi.org/10.1109/IPDPS.2017.80.
[20] J. Li, Y. Ma, and R. Vuduc, ParTI!: A Parallel Tensor Infrastructure for Multicore CPUs and GPUs, Oct. 2018, https://github.com/hpcgarage/ParTI.
[21] J. Li, J. Sun, and R. Vuduc, HiCOO: Hierarchical Storage of Sparse Tensors, in Proceedings of the ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC18), IEEE Press, Piscataway, NJ, 2018, 19.
[22] B. Liu, C. Wen, A. D. Sarwate, and M. Mehri Dehnavi, A Unified Optimization Approach for Sparse Tensor Operations on GPUs, preprint, https://arxiv.org/abs/1705.09905, 2017.
[23] J. D. McCalpin, Memory bandwidth and machine balance in current high performance computers, IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, 12 (1995), pp. 19–25.
[24] A. Novikov, D. Podoprikhin, A. Osokin, and D. Vetrov, Tensorizing Neural Networks, preprint, http://arxiv.org/abs/1509.06569, 2015.
[25] OpenMP Architecture Review Board, OpenMP Application Program Interface, 2013, https://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdf.
[26] E. Phipps, M. D’Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures, SIAM J. Sci. Comput., 39 (2017), pp. C162–C193, https://doi.org/10.1137/15M1044679. · Zbl 1365.65017
[27] T. B. Rolinger, T. A. Simon, and C. D. Krieger, Performance challenges for heterogeneous distributed tensor decompositions, in IEEE High Performance Extreme Computing Conference HPEC 2017, IEEE, 2017.
[28] S. Smith, J. W. Choi, J. Li, R. Vuduc, J. Park, X. Liu, and G. Karypis, FROSTT: The Formidable Repository of Open Sparse Tensors and Tools, 2017, http://frostt.io/.
[29] S. Smith and G. Karypis, Tensor-matrix products with a compressed sparse tensor, in Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 ’15, ACM, New York, 2015, 5, https://doi.org/10.1145/2833179.2833183.
[30] S. Smith, J. Park, and G. Karypis, Sparse tensor factorization on many-core processors with high-bandwidth memory, in Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2017, pp. 1058–1067, https://doi.org/10.1109/IPDPS.2017.84.
[31] S. Smith, N. Ravindran, N. D. Sidiropoulos, and G. Karypis, SPLATT: Efficient and parallel sparse tensor-matrix multiplication, in IPDPS 2015: IEEE International Parallel and Distributed Processing Symposium, IEEE, 2015, pp. 61–70, https://doi.org/10.1109/ipdps.2015.27.
[32] E. Solomonik, D. Matthews, J. R. Hammond, J. F. Stanton, and J. Demmel, A massively parallel tensor contraction framework for coupled-cluster computations, J. Parallel Distrib. Comput., 74 (2014), pp. 3176–3190, https://doi.org/10.1016/j.jpdc.2014.06.002.
[33] D. Vandevoorde and N. M. Josuttis, C++ Templates: The Complete Guide, Addison-Wesley, Reading, MA, 2002.
[34] H. Wang, P. Wu, I. G. Tanase, M. J. Serrano, and J. E. Moreira, Simple, portable and fast SIMD intrinsic programming: Generic simd library, in Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, WPMVP ’14, ACM, New York, 2014, pp. 9–16, https://doi.org/10.1145/2568058.2568059.
[35] Y. Wang, R. Chen, J. Ghosh, J. C. Denny, A. Kho, Y. Chen, B. A. Malin, and J. Sun, Rubik: Knowledge guided tensor factorization and completion for health data analytics, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, ACM, New York, 2015, pp. 1265–1274, https://doi.org/10.1145/2783258.2783395.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.