A robust and scalable multi-level domain decomposition preconditioner for multi-core architecture with large number of cores. (English) Zbl 07193297

Summary: With the evolution of High Performance Computing, multi-core and many-core systems are a common feature of new hardware architectures. The required programming efforts induced by the introduction of these architectures are challenging due to the increasing number of cores. Parallel programming models based on the data flow model and the task programming paradigm intend to fix this issue. Iterative linear solvers are a key part of petroleum reservoir simulation as they can represent up to 80% of the total computing time. In these algorithms, the standard preconditioning methods for large, sparse and unstructured matrices – such as Incomplete LU Factorization (ILU) or Algebraic Multigrid (AMG) – fail to scale on shared-memory architectures with large number of cores. Multi-level domain decomposition (DDML) preconditioners recently introduced seem to be both numerically robust and scalable on emerging architectures because of their parallel nature. This paper proposes a parallel implementation of these preconditioners using the task programming paradigm with a data flow model. This approach is validated on linear systems extracted from realistic petroleum reservoir simulations. This shows that, given an appropriate coarse operator in such preconditioners, the method has good convergence rates while our implementation ensures interesting scalability on multi-core architectures.


65-XX Numerical analysis
68-XX Computer science
Full Text: DOI


[1] van der Vorst, H. A., Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 13, 2, 631-644 (1992)
[2] Saad, Y.; Schultz, M. H., GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 7, 3, 856-869 (1986)
[3] Barnard, S. T.; Bernardo, L. M.; Simon, H. D., An MPI implementation of the SPAI preconditioner on the t3E, Int. J. High Perform. Comput. Appl., 13, 107-128 (1999)
[4] Brandt, A.; Mccormick, S. F.; Ruge, J. W., (Evans, D. J., Algebraic Multigrid (AMG) for Sparse Matrix Equations (1984), Cambridge University Press: Cambridge University Press New York)
[5] Baker, A. H.; Gamblin, T.; Schulz, M.; Yang, U. M., Challenges of scaling algebraic multigrid across modern multicore architectures, (Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International (2011)), 275-286
[6] Park, J.; Smelyanskiy, M.; Yang, U. M.; Mudigere, D.; Dubey, P., High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems, (Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15 (2015), ACM: ACM New York, NY, USA), 54:1-54:12
[7] Dolean, V.; Jolivet, P.; Nataf, F., An Introduction to Domain Decomposition Methods (2015), Society for Industrial and Applied Mathematics: Society for Industrial and Applied Mathematics Philadelphia, PA
[8] Spillane, N.; Dolean, V.; Hauret, P.; Nataf, F.; Pechstein, C.; Scheichl, R., A robust two-level domain decomposition preconditioner for systems of PDEs, C. R. Math., 349, 23, 1255-1259 (2011)
[9] Gratien, J. M., An abstract object oriented runtime system for heterogeneous parallel architecture, (Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2013 IEEE 27th International (2013)), 1203-1212
[10] Broquedis, F.; Clet-Ortega, J.; Moreaud, S.; Furmento, N.; Goglin, B.; Mercier, G.; Thibault, S.; Namyst, R., hwloc: a generic framework for managing hardware affinities in HPC applications, (PDP 2010 - the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing (2010), Pisa, IEEE: Pisa, IEEE Italy)
[11] Saad, Y., Iterative Methods for Sparse Linear Systems (2003), Society for Industrial and Applied Mathematics: Society for Industrial and Applied Mathematics Philadelphia, PA, USA
[12] Magras, J.; Quandalle, P.; Bia, P., High-performance reservoir simulation with parallel ATHOS, (SPE Reservoir Simulation Symposium (2001), Society of Petroleum Engineers)
[13] Gratien, J.-M.; Guignon, T.; Magras, J.-F.; Quandalle, P.; Ricois, O. M., Scalability and load balancing problems in parallel reservoir simulation, (SPE Reservoir Simulation Symposium (2007), Society of Petroleum Engineers)
[14] Chow, E.; Patel, A., Fine-grained parallel incomplete LU factorization, SIAM J. Sci. Comput., 37, 2, C169-C193 (2015)
[15] Falgout, R. D.; Jones, J. E.; Yang, U. M., The design and implementation of hypre, a library of parallel high performance preconditioners, (Bruaset, A. M.; Tveito, A., Numerical Solution of Partial Differential Equations on Parallel Computers (2006), Springer Berlin Heidelberg: Springer Berlin Heidelberg Berlin, Heidelberg), 267-294
[16] Feng, C.; Shu, S.; Yue, X., An improvement to the OpenMP version of BoomerAMG, (Zhang, Y.; Li, K.; Xiao, Z., High Performance Computing: 8th CCF Conference. High Performance Computing: 8th CCF Conference, HPC 2012, Zhangjiajie, China, October 29-31, 2012, Revised Selected Papers (2013), Springer Berlin Heidelberg: Springer Berlin Heidelberg Berlin, Heidelberg), 1-11
[17] Schwarz, H. A., Uber einen grenzubergang durch alternierendes verfahren, 272-286 (1870)
[18] Aavatsmark, I.; Barkve, T.; Bøe, Ø.; Mannseth, T., Discretization on non-orthogonal, quadrilateral grids for inhomogeneous, anisotropic media, J. Comput. Phys., 127, 1, 2-14 (1996)
[19] Eymard, R.; Guichard, C.; Herbin, R.; Masson, R., Vertex-centred discretization of multiphase compositional Darcy flows on general meshes, Comput. Geosci., 16, 4, 987-1005 (2012)
[20] Boyer, F.; Hubert, F.; Krell, S., Nonoverlapping Schwarz algorithm for solving two-dimensional m-DDFV schemes, IMA J. Numer. Anal., 30, 4, 1062-1100 (2010)
[21] Droniou, J.; Eymard, R.; Gallouët, T.; Herbin, R., A unified approach to mimetic finite difference, hybrid finite volume and mixed finite volume methods, Math. Models Methods Appl. Sci., 20, 02, 265-295 (2010)
[22] Droniou, J.; Eymard, R.; Herbin, R., Gradient schemes: generic tools for the numerical analysis of diffusion equations, ESAIM Math. Model. Numer. Anal., 50, 3, 749-781 (2016)
[23] Guennebaud, G.; Jacob, B., Eigen v3 (2010)
[24] Christie, M.; Blunt, M., Tenth SPE comparative solution project: A comparison of upscaling techniques, SPE Reserv. Eval. Eng., 4, 2, 308-317 (2001)
[25] Jolivet, P.; Hecht, F.; Nataf, F.; Prud’Homme, C., Scalable domain decomposition preconditioners for heterogeneous elliptic problems, (SC13 - International Conference for High Performance Computing, Networking, Storage and Analysis (2013), ACM: ACM Denver, United States), 80:1-80:11
[26] Tang, J. M.; Nabben, R.; Vuik, C.; Erlangga, Y. A., Comparison of two-level preconditioners derived from deflation, domain decomposition and multigrid methods, J. Sci. Comput., 39, 3, 340-370 (2009)
[27] Gee, M. W.; Siefert, C. M.; Hu, J. J.; Tuminaro, R. S.; Sala, M. G., ML 5.0 Smoothed Aggregation User’s GuideTech. Rep., Technical Report SAND2006-2649 (2006), Sandia National Laboratories
[28] Blatt, M.; Bastian, P., The iterative solver template library, (International Workshop on Applied Parallel Computing (2006), Springer), 666-675
[29] PETSc home page (2018)
[30] Trilinos home page (2018)
[31] Minden, V.; Smith, B.; Knepley, M. G., Preliminary implementation of PETSc using GPUs, (Yuen, D. A.; Wang, L.; Chi, X.; Johnsson, L.; Ge, W.; Shi, Y., GPU Solutions to Multi-Scale Problems in Science and Engineering (2013), Springer Berlin Heidelberg: Springer Berlin Heidelberg Berlin, Heidelberg), 131-140
[32] Kreutzer, M.; Thies, J.; Röhrig-Zöllner, M.; Pieper, A.; Shahzad, F.; Galgon, M.; Basermann, A.; Fehske, H.; Hager, G.; Wellein, G., GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems, Int. J. Parallel Program., 45, 5, 1046-1072 (2017)
[33] Blumofe, R. D.; Joerg, C. F.; Kuszmaul, B. C.; Leiserson, C. E.; Randall, K. H.; Zhou, Y., Cilk: An efficient multithreaded runtime system, J. Parallel Distrib. Comput., 37, 1, 55-69 (1996)
[34] Ayguadé, E.; Badia, R. M.; Bellens, P.; Cabrera, D.; Duran, A.; Ferrer, R.; Gonzàlez, M.; Igual, F.; Jiménez-González, D.; Labarta, J.; Martinell, L.; Martorell, X.; Mayo, R.; Pérez, J. M.; Planas, J.; Quintana-Ortí, E. S., Extending OpenMP to survive the heterogeneous multi-core era, Int. J. Parallel Program., 38, 5, 440-459 (2010)
[35] Augonnet, C.; Thibault, S.; Namyst, R.; Wacrenier, P.-A., StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurr. Comput. Pract. Exp., 23, 2, 187-198 (2011)
[36] Lima, J. V.F.; Gautier, T.; Danjean, V.; Raffin, B.; Maillard, N., Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures, Parallel Comput., 44, 37-52 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.