zbMATH — the first resource for mathematics

Performance comparison of HPX versus traditional parallelization strategies for the discontinuous Galerkin method. (English) Zbl 1427.65238
Summary: As high performance computing moves towards the exascale computing regime, applications are required to expose increasingly fine grain parallelism to efficiently use next generation supercomputers. Intended as a solution to the programming challenges associated with these architectures, High Performance ParalleX (HPX) is a task-based C++ runtime, which emphasizes the use of lightweight threads and algorithm-dependent synchronization to maximize parallelism exposed by the application to the machine. The aim of this work is to explore the performance benefits of an HPX parallelization versus a MPI parallelization for the discontinuous Galerkin finite element method for the two-dimensional shallow water equations. We present strong and weak scaling results comparing the performance of HPX versus a MPI parallelization strategy on Knights Landing architectures. Our results indicate that for average task sizes of 3.6 ms, HPX’s runtime overhead is offset by more efficient execution of the application. Furthermore, we demonstrate that running with sufficiently large task granularity, HPX is able to outperform the MPI parallelization by a factor of approximately 1.2 for up to 128 nodes.

65M60 Finite element, Rayleigh-Ritz and Galerkin methods for initial value and initial-boundary value problems involving PDEs
65Y10 Numerical algorithms for specific classes of architectures
76B15 Water waves, gravity waves; dispersion and scattering, nonlinear interaction
76M10 Finite element methods applied to problems in fluid mechanics
Full Text: DOI
[1] Amarasinghe, S., Hall, M., Lethin, R., Pingali, K., Quinlan, D., Sarkar, V., Shalf, J., Lucas, R., Yelick, K., Balanji, P., et al.: Exascale programming challenges. In: Proceedings of the Workshop on Exascale Programming Challenges, Marina del Rey, CA, USA. US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) (2011)
[2] Augonnet, C.; Thibault, S.; Namyst, R.; Wacrenier, PA, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurr. Comput. Pract. Exp., 23, 187-198, (2011)
[3] Baggag, A., Atkins, H., Keyes, D.: Parallel implementation of the discontinuous Galerkin method. Tech. Rep. ICASE-99-35, Institute for Computer Applications in Science and Engineering (1999)
[4] Balaji, P.: Programming Models for Parallel Computing. MIT Press, Cambridge (2015) · Zbl 1373.68017
[5] Barat, R.: Load balancing of multi-physics simulation by multi-criteria graph partitioning. Ph.D. thesis, Université de Bordeaux (2017)
[6] Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: Expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)
[7] Bremer, M.H., Bachan, J.D., Chan, C.P.: Semi-static and dynamic load balancing for asynchronous hurricane storm surge simulations. In: Proceedings of the Parallel Applications Workshop, Alternatives to MPI, p. 13. IEEE (2018)
[8] Browne, S.; Dongarra, J.; Garner, N.; Ho, G.; Mucci, P., A portable programming interface for performance evaluation on modern processors, Int. J. High Perform. Comput. Appl., 14, 189-204, (2000)
[9] Brus, S.: Efficiency improvements for modeling coastal hydrodynamics through the application of high-order discontinuous Galerkin solutions to the shallow water equations. Ph.D. thesis, University of Notre Dame (2017)
[10] Brus, SR; Wirasaet, D.; Westerink, JJ; Dawson, C., Performance and scalability improvements for discontinuous Galerkin solutions to conservation laws on unstructured grids, J. Sci. Comput., 70, 210-242, (2017) · Zbl 1359.65195
[11] Bunya, S.; Dietrich, J.; Westerink, J.; Ebersole, B.; Smith, J.; Atkinson, J.; Jensen, R.; Resio, D.; Luettich, R.; Dawson, C.; Cardone, V.; Cox, A.; Powell, M.; Westerink, H.; Roberts, H., A high resolution coupled riverine flow, tide, wind, wind wave and storm surge model for Southern Louisiana and Mississippi: part I—model development and validation, Mon. Weather Rev., 138, 345-377, (2010)
[12] Bunya, S.; Kubatko, EJ; Westerink, JJ; Dawson, C., A wetting and drying treatment for the Runge-Kutta discontinuous Galerkin solution to the shallow water equations, Comput. Methods Appl. Mech. Eng., 198, 1548-1562, (2009) · Zbl 1227.76026
[13] Chamberlain, B.; Callahan, D.; Zima, H., Parallel programmability and the Chapel language, Int. J. High Perform. Comput. Appl., 21, 291-312, (2007)
[14] Cockburn, B.; Shu, CW, TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. II. General framework, Math. Comput., 52, 411-435, (1989) · Zbl 0662.65083
[15] Cockburn, B.; Shu, CW, Runge-Kutta discontinuous Galerkin methods for convection-dominated problems, J. Sci. Comput., 16, 173-261, (2001) · Zbl 1065.76135
[16] Dawson, C.; Kubatko, EJ; Westerink, JJ; Trahan, C.; Mirabito, C.; Michoski, C.; Panda, N., Discontinuous Galerkin methods for modeling hurricane storm surge, Adv. Water Resour., 34, 1165-1176, (2011)
[17] Dietrich, J.; Westerink, J.; Kennedy, A.; Smith, J.; Jensen, RE; Zijlema, M.; Holthuijsen, L.; Dawson, C.; Luettich, R.; Powell, M.; Cardone, V.; Cox, A.; Stone, G.; Pourtaheri, H.; Hope, M.; Tanaka, S.; Westerink, L.; Westerink, HJ; Cobell, Z., Hurricane Gustav (2008) waves and storm surge: hindcast, synoptic analysis and validation in Southern Louisiana, Mon. Weather Rev., 139, 2488-2522, (2011)
[18] Dietrich, J.; Zijlema, M.; Westerink, J.; Holtjuijsen, L.; Dawson, C.; Luettich, RA; Jensen, R.; Smith, J.; Stelling, G.; Stone, G., Modeling hurricane wave and storm surge using integrally-coupled, scalable computations, Coast. Eng., 58, 45-65, (2011)
[19] Dietrich, JC; Bunya, S.; Westerink, JJ; Ebersole, BA; Smith, JM; Atkinson, JH; Jensen, R.; Resio, DT; Luettich, RA; Dawson, C.; Cardone, VJ; Cox, AT; Powell, MD; Westerink, HJ; Roberts, HJ, A high resolution coupled riverine flow, tide, wind, wind wave and storm surge model for southern Louisiana and Mississippi: part II—synoptic description and analyses of Hurricanes Katrina and Rita, Mon. Weather Rev., 138, 378-404, (2010)
[20] Dubiner, M., Spectral methods on triangles and other domains, J. Sci. Comput., 6, 345-390, (1991) · Zbl 0742.76059
[21] Dutykh, D.; Clamond, D., Modified shallow water equations for significantly varying seabeds, Appl. Math. Model., 40, 9767-9787, (2016)
[22] El-Ghazawi, T., Carlson, W., Sterling, T., Yelick, K.: UPC: Distributed Shared Memory Programming, vol. 40. Wiley, London (2005)
[23] Gandham, R.; Medina, D.; Warburton, T., GPU accelerated discontinuous Galerkin methods for shallow water equations, Commun. Comput. Phys., 18, 3764, (2015) · Zbl 1373.76086
[24] Gottlieb, S.; Shu, CW; Tadmor, E., Strong stability-preserving high-order time discretization methods, SIAM Rev., 43, 89-112, (2001) · Zbl 0967.65098
[25] Grubel, P., Kaiser, H., Cook, J., Serio, A.: The performance implication of task size for applications on the HPX runtime system. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER), pp. 682-689. IEEE (2015)
[26] Grubel, P., Kaiser, H., Huck, K.A., Cook, J.: Using intrinsic performance counters to assess efficiency in task-based parallel applications. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1692-1701 (2016)
[27] Grun, P., Hefty, S., Sur, S., Goodell, D., Russell, R.D., Pritchard, H., Squyres, J.M.: A brief introduction to the OpenFabrics interfaces-a new network API for maximizing high performance application efficiency. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI), pp. 34-39. IEEE (2015)
[28] Heller, T., Diehl, P., Byerly, Z., Biddiscombe, J., Kaiser, H.: HPX—an open source C++ standard library for parallelism and concurrency. In: Proceedings of OpenSuCo, OpenSuCo’17. ACM (2017)
[29] Heller, T., Kaiser, H., Diehl, P., Fey, D., Schweitzer, M.A.: Closing the Performance Gap with Modern C++. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) High Performance Computing, Lecture Notes in Computer Science, vol. 9945, pp. 18-31. Springer International Publishing, Berlin (2016)
[30] Heller, T., Kaiser, H., Schäfer, A., Fey, D.: Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers. In: Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, p. 1. ACM (2013)
[31] Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications. Springer, Berlin (2007) · Zbl 1134.65068
[32] Hope, ME; Westerink, JJ; Kennedy, AB; Kerr, PC; Dietrich, JC; Dawson, C.; Bender, CJ; etal., Hindcast and validation of Hurricane Ike (2008) waves, forerunner, and storm surge, J. Geophys. Res. Oceans, 118, 4424-4460, (2013)
[33] Hope, M., Westerink, J., Kennedy, A., Smith, J., Westerink, H., Cox, A., Nong, S., Roberts, K., Resio, D., A.P, T.: Hurricane Sandy (2012) wind, waves and storm surge in New York Bight. I: Model validation. J. Waterw. Port Coast. Ocean Eng. (2016)
[34] Iglberger, K.; Hager, G.; Treibig, J.; Rüde, U., Expression templates revisited: a performance analysis of current methodologies, SIAM J. Sci. Comput., 34, c42-c69, (2012)
[35] Kaiser, H., Adelstein Lelbach, B., Heller, T., Berg, A., Biddiscombe, J., Bikineev, A., et al.: STEllAR-GROUP/hpx: HPX V1.0: The C++ standards library for parallelism and concurrency (Version 1.0.0). Zenodo. https://doi.org/10.5281/zenodo.556772 (2107)
[36] Kale, L.V., Krishnan, S.: CHARM++: A portable concurrent object oriented system based on C++. In: ACM Sigplan Notices, vol. 28, pp. 91-108. ACM (1993)
[37] Karypis, G.; Kumar, V., A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20, 359-392, (1998) · Zbl 0915.68129
[38] Kogge, P.; Shalf, J., Exascale computing trends: adjusting to the “new normal” for computer architecture, Comput. Sci. Eng., 15, 16-26, (2013)
[39] Kubatko, E.; Bunya, S.; Dawson, C.; Westerink, J., Dynamic p-adaptive Runge-Kutta discontinuous Galerkin methods for the shallow water equations, Comput. Methods Appl. Mech. Eng., 198, 1766-1774, (2009) · Zbl 1227.76032
[40] Kubatko, E.; Bunya, S.; Dawson, C.; Westerink, J.; Mirabito, C., A performance comparison of continuous and discontinuous finite element shallow water models, J. Sci. Comput., 40, 315-339, (2009) · Zbl 1203.76085
[41] Kubatko, E.; Westerink, J.; Dawson, C., \(hp\) Discontinuous Galerkin methods for advection dominated problems in shallow water flow, Comput. Methods Appl. Mech. Eng., 196, 437-451, (2006) · Zbl 1120.76348
[42] Kubatko, EJ; Westerink, JJ; Dawson, C., Semi discrete discontinuous Galerkin methods and stage-exceeding-order, strong-stability-preserving Runge-Kutta time discretizations, J. Comput. Phys., 222, 832-848, (2007) · Zbl 1113.65093
[43] Luettich, R., et al. J.W.: ADCIRC: a parallel advanced circulation model for oceanic, coastal and estuarine waters (2017). Users manual www.adcirc.org
[44] Michoski, C.; Alexanderian, A.; Paillet, C.; Kubatko, E.; Dawson, C., Stability of nonlinear convection-diffusion-reaction systems in discontinuous Galerkin methods, J. Sci. Comput., 70, 516-550, (2017) · Zbl 1361.65064
[45] Michoski, C., Dawson, C., Kubatko, E., Wirasaet, D., Brus, S., Westerink, J.: A comparison of artificial viscosity, limiters, and filter, for high order discontinuous Galerkin solution in nonlinear settings. J. Sci. Comput. (2015). https://doi.org/10.1007/s10915.015.0027.2 · Zbl 1338.65228
[46] Michoski, C.; Dawson, C.; Mirabito, C.; Kubatko, E.; Wirasaet, D.; Westerink, J., Fully coupled methods for multiphase morphodynamics, Adv. Water Resour., 59, 95-110, (2013) · Zbl 1269.65099
[47] Michoski, C.; Mirabito, C.; Dawson, C.; Wirasaet, D.; Kubatko, EJ; Westerink, JJ, Adaptive hierarchic transformations for dynamically p-enriched slope-limiting over discontinuous Galerkin systems of generalized equations, J. Comput. Phys., 230, 8028-8056, (2011) · Zbl 1269.65099
[48] Numrich, R.W., Reid, J.: Co-Array Fortran for parallel programming. In: ACM Sigplan Fortran Forum, vol. 17, pp. 1-31. ACM (1998)
[49] OpenMP Architecture Review Board: OpenMP Application Program Interface Version 3.0 (2008). http://www.openmp.org/mp-documents/spec30.pdf
[50] Reed, W.H., Hill, T.: Triangular mesh methods for the neutron transport equation. Tech. rep., Los Alamos Scientific Lab., N. Mex. (USA) (1973)
[51] Tanaka, S.; Bunya, S.; Westerink, J.; Dawson, C.; Luettich, R., Scalability of an unstructured grid continuous Galerkin based hurricane storm surge model, J. Sci. Comput., 46, 329-358, (2011) · Zbl 1270.76038
[52] Westerink, JJ; Luettich, RA; Feyen, JC; Atkinson, JH; Dawson, CN; Roberts, HJ; Powell, MD; Dunion, JP; Kubatko, EJ; Pourtaheri, H., A basin to channel scale unstructured grid hurricane storm surge model applied to southern Louisiana, Mon. Weather Rev., 136, 833-864, (2008)
[53] Wirasaet, D.; Brus, S.; Michoski, C.; Kubatko, E.; Westerink, J., Artificial boundary layers in discontinuous Galerkin solutions to shallow water equations in channels, J. Comput. Physics, 299, 597-612, (2015) · Zbl 1351.76085
[54] Wirasaet, D.; Kubatko, E.; Michoski, C.; Tanaka, S.; Westerink, J.; Dawson, C., Discontinuous Galerkin methods with nodal and hybrid modal/nodal triangular, quadrilateral, and polygonal elements for nonlinear shallow water flow, Comput. Methods Appl. Mech. Eng., 270, 113-149, (2014) · Zbl 1296.76089
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.