zbMATH — the first resource for mathematics

Superglue: a shared memory framework using data versioning for dependency-aware task-based parallelization. (English) Zbl 1327.65290

65Y05 Parallel numerical computation
65Y10 Numerical algorithms for specific classes of architectures
Full Text: DOI
[1] C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier, StarPU: A unified platform for task scheduling on heterogeneous multicore architectures, Concurrency Computat. Pract. Exper., Euro-Par 2009, 23 (2011), pp. 187–198.
[2] E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang, The design of OpenMP tasks, IEEE Trans. Parallel Distrib. Syst., 20 (2009), pp. 404–418.
[3] P. Bauer, S. Engblom, and S. Widgren, Fast Event-Based Epidemiological Simulations on National Scales, preprint, http://arxiv.org/abs/1502.02908arXiv:1502.02908 [q-bio.PE], 2014.
[4] P. Bellens, J. M. Pérez, R. M. Badia, and J. Labarta, CellSs: A programming model for the Cell BE architecture, in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC ’06), ACM, New York, 2006.
[5] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, Cilk: An efficient multithreaded runtime system, SIGPLAN Not., 30 (1995), pp. 207–216.
[6] A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Comput., 35 (2009), pp. 38–53.
[7] L. Dagum and R. Menon, OpenMP: An industry standard API for shared-memory programming, IEEE Comput. Sci. Eng., 5 (1998), pp. 46–55.
[8] J, Dongarra, J. Kurzak, J. Langou, J. Langou, H. Ltaief, P. Luszczek, A. YarKhan, W. Alvaro, M. Faverge, A. Haidar, J. Hoffman, E. Agullo, A. Buttari, and B. Hadri, PLASMA Users’ Guide, \burlhttp://icl.cs.utk.edu/plasma/.
[9] A. Duran, E. Ayguadé, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas, OmpSs: A proposal for programming heterogeneous multi-core architectures, Parallel Process. Lett., 21 (2011), pp. 173–193.
[10] F. Galilée, J.-L. Roch, G. G. H. Cavalheiro, and M. Doreille, Athapascan-1: On-line building data flow graph in a parallel language, in Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques (PACT ’98), Washington, D.C., 1998, IEEE Computer Society, Los Alamitos, CA, pp. 88–95.
[11] T. Gautier, J. V. F. Lima, N. Maillard, and B. Raffin, XKaapi: A runtime system for data-flow task programming on heterogeneous architectures, in Proceedings of the 27th International IEEE Symposium on Parallel and Distributed Processing (IPDPS ’13), 2013, pp. 1299–1308.
[12] M. Holm, S. Engblom, A. Goude, and S. Holmgren, Dynamic autotuning of adaptive fast multipole methods on hybrid multicore CPU and GPU systems, SIAM J. Sci. Comput., 36 (2014), pp. C376–C399. · Zbl 1305.65008
[15] J. Kurzak and J. Dongarra, Implementing linear algebra routines on multi-core processors with pipelining and a look ahead, in Applied Parallel Computing: State of the Art in Scientific Computing, Lecture Notes in Comput. Sci. 4699, B. K\aagström, E. Elmroth, J. Dongarra, and J. Waśniewski, eds., Springer, Berlin, Heidelberg, 2007, pp. 147–156.
[16] J. Kurzak, H. Ltaief, J. Dongarra, and R. Badia, Scheduling dense linear algebra operations on multicore processors, Concurr. Comput., 22 (2010), pp. 15–44.
[17] M. S. Lam and M. C. Rinard, Coarse-grain parallel programming in Jade, SIGPLAN Not., 26 (1991), pp. 94–105.
[18] C. E. Leiserson, The Cilk++ concurrency platform, J. Supercomput., 51 (2010), pp. 244–257.
[19] C. Niethammer, C. W. Glass, and J. Gracia, Avoiding serialization effects in data / dependency aware task parallel algorithms for spatial decomposition, in Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA ’12), Washington, D.C., 2012, IEEE Computer Society, Los Alamitos, CA, pp. 743–748.
[20] OpenMP Architecture Review Board, OpenMP Application Program Interface Version 4.0, 2013.
[21] J. M. Pérez, R. M. Badia, and J. Labarta, A dependency-aware task-based programming environment for multi-core architectures, in Proceedings of the 2008 IEEE International Conference on Cluster Computing, 2008, pp. 142–151.
[22] L. Sunde, Parallelizing a Software Framework for Radial Basis Function Methods, manuscript, 2011.
[23] M. Tillenius, Leveraging Multicore Processors for Scientific Computing, licentiate thesis, Department of Information Technology, Uppsala University, Uppsala, Sweden, 2012.
[24] M. Tillenius and E. Larsson, An efficient task-based approach for solving the \(n\)-body problem on multicore architectures, in PARA 2010: State of the Art in Scientific and Parallel Computing, University of Iceland, Reykjavík, Iceland, 2010.
[25] M. Tillenius, E. Larsson, R. M. Badia, and X. Martorell, Resource-aware task scheduling, ACM Trans. Embed. Comput. Syst., 14 (2015), pp. 5:1–5:25.
[26] M. Tillenius, E. Larsson, E. Lehto, and N. Flyer, A scalable RBF–FD method for atmospheric flow, J. Comput. Phys., 298 (2015), pp. 406–422. · Zbl 1349.86014
[27] H. Vandierendonck, G. Tzenakis, and D. S. Nikolopoulos, A unified scheduler for recursive and task dataflow parallelism, in Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT ’11), 2011, pp. 1–11.
[28] A. YarKhan, J. Kurzak, and J. Dongarra, QUARK Users’ Guide: Queueing and Runtime for Kernels, Tech. Report ICL-UT-11-02, Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, 2011.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.