Productivity, performance, and portability for computational fluid dynamics applications. (English) Zbl 07163727

Summary: Hardware trends over the last decade show increasing complexity and heterogeneity in high performance computing architectures, which presents developers of CFD applications with three key challenges; the need for achieving good performance, being able to utilise current and future hardware by being portable, and doing so in a productive manner. These three appear to contradict each other when using traditional programming approaches, but in recent years, several strategies such as template libraries and Domain Specific Languages have emerged as a potential solution; by giving up generality and focusing on a narrower domain of problems, all three can be achieved.
This paper gives an overview of the state-of-the-art for delivering performance, portability, and productivity to CFD applications, ranging from high-level libraries that allow the symbolic description of PDEs to low-level techniques that target individual algorithmic patterns. We discuss advantages and challenges in using each approach, and review the performance benchmarking literature that compares implementations for hardware architectures and their programming methods, giving an overview of key applications and their comparative performance.


76-XX Fluid mechanics
Full Text: DOI Link


[1] Patterson, D., The trouble with multi-core, IEEE Spectr, 47, 7, 28-32 (2010)
[2] Asanović, K.; Bodik, R.; Catanzaro, B. C.; Gebis, J. J.; Husbands, P.; Keutzer, K., The Landscape of Parallel Computing Research: A View from Berkeley, Tech. Rep. (2006), EECS Department, University of California, Berkeley
[3] Alimirzazadeh, S.; Jahanbakhsh, E.; Maertens, A.; Leguizamón, S.; Avellan, F., GPU-accelerated 3-D finite volume particle method, Comput Fluids, 171, 79-93 (2018) · Zbl 1410.65337
[4] Diaz, M. A.; Solovchuk, M. A.; Sheu, T. W., High-performance multi-GPU solver for describing nonlinear acoustic waves in homogeneous thermoviscous media, Comput Fluids, 173, 195-205 (2018) · Zbl 1410.76283
[5] Gorobets, A.; Soukov, S.; Bogdanov, P., Multilevel parallelization for simulating compressible turbulent flows on most kinds of hybrid supercomputers, Comput Fluids, 173, 171-177 (2018) · Zbl 1410.76229
[6] Ren, F.; Song, B.; Zhang, Y.; Hu, H., A GPU-accelerated solver for turbulent flow and scalar transport based on the Lattice Boltzmann method, Comput Fluids, 173, 29-36 (2018) · Zbl 1410.76047
[7] Liu, R. K.-S.; Wu, C.-T.; Kao, N. S.-C.; Sheu, T. W.-H., An improved mixed Lagrangian-Eulerian (IMLE) method for modelling incompressible Navier-Stokes flows with CUDA programming on multi-GPUs, Comput Fluids, 184, 99-106 (2019) · Zbl 1411.76103
[8] Lee, Y.-H.; Huang, L.-M.; Zou, Y.-S.; Huang, S.-C.; Lin, C.-A., Simulations of turbulent duct flow with lattice Boltzmann method on GPU cluster, Comput Fluids, 168, 14-20 (2018) · Zbl 1390.76736
[9] Hashimoto, T.; Yasuda, T.; Tanno, I.; Tanaka, Y.; Morinishi, K.; Satofuka, N., Multi-GPU parallel computation of unsteady incompressible flows using kinetically reduced local navier-Stokes equations, Comput Fluids, 167, 215-220 (2018) · Zbl 1390.76571
[10] Kao, N. S.-C.; Sheu, T. W.-H., Development of a finite element flow solver for solving three-dimensional incompressible Navier-Stokes solutions on multiple GPU cards, Comput Fluids, 167, 285-291 (2018) · Zbl 1390.76327
[11] Singh, J. P.; Hennessy, J. L., An empirical investigation of the effectiveness and limitations of automatic parallelization, Shared memory multiprocessing, 203-207 (1992)
[12] Wong M., Richards A., Rovatsou M., Reyes R.. Khronos’s OpenCL SYCL to support heterogeneous devices for C++. 2016.
[13] Rul, S.; Vandierendonck, H.; D’Haene, J.; De Bosschere, K., An experimental study on performance portability of OpenCL kernels, 2010 Symposium on application accelerators in high performance computing (SAAHPC’10) (2010)
[14] Komatsu, K.; Sato, K.; Arai, Y.; Koyama, K.; Takizawa, H.; Kobayashi, H., Evaluating performance and portability of OpenCL programs, The fifth international workshop on automatic performance tuning, 66, 1 (2010)
[15] Pennycook, S. J.; Hammond, S. D.; Wright, S. A.; Herdman, J.; Miller, I.; Jarvis, S. A., An investigation of the performance portability of OpenCL, J Parallel Distrib Comput, 73, 11, 1439-1450 (2013)
[16] Its Official: Aurora on Track to Be First US Exascale Computer in 2021. 2019. https://www.hpcwire.com/2019/03/18/its-official-aurora-on-track-to-be-first-u-s-exascale-computer-in-2021/.
[17] He, Q.; Chen, H.; Feng, J., Acceleration of the OpenFOAM-based MHD solver using graphics processing units, Fusion Eng Design, 101, 88-93 (2015)
[18] Malecha, Z.; Mirosław, Ł.; Tomczak, T.; Koza, Z.; Matyka, M.; Tarnawski, W., GPU-based simulation of 3D blood flow in abdominal aorta using OpenFOAM, Arch Mech, 63, 2, 137-161 (2011) · Zbl 1301.76086
[19] Heroux, M. A.; Bartlett, R. A.; Howle, V. E.; Hoekstra, R. J.; Hu, J. J.; Kolda, T. G., An overview of the Trilinos project, ACM Trans Math Softw (TOMS), 31, 3, 397-423 (2005) · Zbl 1136.65354
[20] Hoemmen, M. F., Summary of current thread parallelization efforts in Trilinos’ linear algebra and solvers., Tech. Rep. (2017), Sandia National Lab.(SNL-NM), Albuquerque, NM (United States)
[21] Balay S., Abhyankar S., Adams M.F., Brown J., Brune P., Buschelman K., et al. PETSc Web page. 2019. http://www.mcs.anl.gov/petsc; http://www.mcs.anl.gov/petsc.
[22] Anderson, E.; Bai, Z.; Bischof, C.; Blackford, S.; Demmel, J.; Dongarra, J., LAPACK users’ guide (1999), Society for Industrial and Applied Mathematics: Society for Industrial and Applied Mathematics Philadelphia, PA · Zbl 0934.65030
[23] Blackford, L. S.; Choi, J.; Cleary, A.; D’Azevedo, E.; Demmel, J.; Dhillon, I., ScaLAPACK users’ guide (1997), Society for Industrial and Applied Mathematics: Society for Industrial and Applied Mathematics Philadelphia, PA · Zbl 0886.65022
[24] Buttari, A.; Langou, J.; Kurzak, J.; Dongarra, J., A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Comput, 35, 1, 38-53 (2009)
[25] Tomov, S.; Dongarra, J.; Baboulin, M., Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Comput, 36, 5-6, 232-240 (2010) · Zbl 1204.68268
[26] Sanderson, C.; Curtin, R., Armadillo: a template-based C++ library for linear algebra, J Open Source Softw, 1, 2, 26 (2016)
[27] Guennebaud G., Jacob B., et al. Eigen v3. 2010. http://eigen.tuxfamily.org.
[28] Davis T., Hager W., Duff I.. SuiteSparse. 2014. http://faculty.cse.tamu.edu/davis/suitesparse.html.
[29] Falgout, R. D.; Jones, J. E.; Yang, U. M., The design and implementation of hypre, a library of parallel high performance preconditioners, Numerical solution of partial differential equations on parallel computers, 267-294 (2006), Springer · Zbl 1097.65059
[30] Notay, Y., An aggregation-based algebraic multigrid method, Electron Trans Numer Anal, 37, 6, 123-146 (2010) · Zbl 1206.65133
[31] Naumov, M.; Arsaev, M.; Castonguay, P.; Cohen, J.; Demouth, J.; Eaton, J., AmgX: a library for GPU accelerated algebraic multigrid and preconditioned iterative methods, SIAM J Sci Comput, 37, 5, S602-S626 (2015) · Zbl 1325.65065
[32] Gupta, A., WSMP: Watson sparse matrix package (Part-I: direct solution of symmetric sparse systems), Tech Rep RC 21886 (2000), IBM TJ Watson Research Center, Yorktown Heights, NY
[33] Li, X. S., An overview of SuperLU: algorithms, implementation, and user interface, ACM Trans Math Softw, 31, 3, 302-325 (2005) · Zbl 1136.65312
[34] Hénon, P.; Ramet, P.; Roman, J., PaStiX: a high-performance parallel direct solver for sparse symmetric definite systems, Parallel Comput, 28, 2, 301-321 (2002) · Zbl 0984.68208
[35] Amestoy, P. R.; Duff, I. S.; L’Excellent, J.-Y.; Koster, J., MUMPS: a general purpose distributed memory sparse solver, International workshop on applied parallel computing, 121-130 (2000), Springer
[36] Raghavan, P., DSCPACK: domain-separator codes for solving sparse linear systems, Tech. Rep. (2002), Tech. rep. CSE-02-004. Department of Computer Science and Engineering, The ...
[37] Sao, P.; Vuduc, R.; Li, X. S., A Distributed CPU-GPU Sparse Direct Solver, (Silva, F.; Dutra, I.; Santos Costa, V., Euro-par 2014 parallel processing (2014), Springer International Publishing: Springer International Publishing Cham), 487-498
[38] https://hal.inria.fr/hal-00700066.
[39] Plauger, P.; Lee, M.; Musser, D.; Stepanov, A. A., C++ standard template library (2000), Prentice Hall PTR: Prentice Hall PTR Upper Saddle River, NJ, USA
[40] Schling, B., The boost C++ libraries (2011), XML Press
[41] Kaiser H., Lelbach B.A., Heller T., Bergé A., Simberg M., Biddiscombe J., et al. STEllAR-GROUP/hpx: HPX V1.2.1: The C++ Standards Library for Parallelism and Concurrency. 2019. https://doi.org/10.5281/zenodo.2573213.
[42] Domain-Specific Languages and High-Level Frameworks for High-Performance Computing. doi:10.1016/j.jpdc.2014.07.003.
[43] Hornung, R.; Jones, H.; Keasler, J.; Neely, R.; Pearce, O.; Hammond, S., ASC Tri-lab Co-design Level 2 Milestone Report 2015, Tech. Rep. (2015), Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States)
[44] Hoberock J., Bell N.. Thrust: A Parallel Template Library. 2010. http://code.google.com/p/thrust/.
[45] Enmyren, J.; Kessler, C. W., SkePU: a multi-backend Skeleton programming library for multi-GPU systems, Proceedings of the fourth international workshop on high-level parallel programming and applications, HLPP ’10, 5-14 (2010), ACM: ACM New York, NY, USA
[46] doi:10.1002/9781119332015.ch13.
[47] Ernsting, S.; Kuchen, H., Algorithmic skeletons for multi-core, multi-GPU systems and clusters, Int J High Perform Comput Netw, 7, 2, 129-138 (2012)
[48] Steuwer, M.; Kegel, P.; Gorlatch, S., SkelCL - a portable skeleton library for high-level GPU programming, 2011 IEEE international symposium on parallel and distributed processing workshops and Phd forum, 1176-1182 (2011)
[49] Chakravarty, M. M.T.; Keller, G.; Lee, S.; McDonell, T. L.; Grover, V., Accelerating Haskell array codes with multicore GPUs, DAMP ’11: the 6th workshop on declarative aspects of multicore programming (2011), ACM
[50] Keller, G.; Chakravarty, M. M.; Leshchinskiy, R.; Peyton Jones, S.; Lippmeier, B., Regular, shape-polymorphic, parallel arrays in haskell, SIGPLAN Not, 45, 9, 261-272 (2010) · Zbl 1323.68127
[51] Ruiz A.. Introduction to hmatrix. 2012.
[52] DeVito, Z.; Joubert, N.; Palacios, F.; Oakley, S.; Medina, M.; Barrientos, M., Liszt: a domain specific language for building portable mesh-based PDE solvers, Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, 9 (2011), ACM
[53] Bernstein, G. L.; Shah, C.; Lemire, C.; Devito, Z.; Fisher, M.; Levis, P., Ebb: a DSL for physical simulation on CPUs and GPUs, ACM Trans Graph, 35, 2 (2016), 21:1-21:12
[54] Earl, C.; Might, M.; Bagusetty, A.; Sutherland, J. C., Nebo: an efficient, parallel, and portable domain-specific language for numerically solving partial differential equations, J Syst Softw, 125, 389-400 (2017)
[55] Ragan-Kelley, J.; Barnes, C.; Adams, A.; Paris, S.; Durand, F.; Amarasinghe, S., Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines, Proceedings of the 34th ACM SIGPLAN conference on programming language design and implementation, PLDI ’13, 519-530 (2013), ACM: ACM New York, NY, USA
[56] Mostafazadeh, B.; Marti, F.; Liu, F.; Chandramowlishwaran, A., Roofline guided design and analysis of a multi-stencil CFD solver for multicore performance, 2018 IEEE international parallel and distributed processing symposium (IPDPS), 753-762 (2018)
[57] Yount, C.; Tobin, J.; Breuer, A.; Duran, A., Yaskyet another stencil kernel: A framework for HPC stencil code-generation and tuning, 2016 sixth international workshop on domain-specific languages and high-level frameworks for high performance computing (WOLFHPC), 30-39 (2016)
[58] http://dl.acm.org/citation.cfm?id=2867549.2868136.
[59] e4062 E4062 cpe.4062 doi:10.1002/cpe.4062.
[60] Zhao, T.; Williams, S.; Hall, M.; Johansen, H., Delivering performance-portable stencil computations on cpus and GPUs using bricks, 2018 IEEE/ACM international workshop on performance, portability and productivity in HPC (P3HPC), 59-70 (2018)
[61] Mudalige, G. R.; Giles, M. B.; Reguly, I.; Bertolli, C.; Kelly, P. H.J., OP2: an active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures, 2012 Innovative parallel computing (InPar), 1-12 (2012)
[62] Rathgeber, F.; Markall, G. R.; Mitchell, L.; Loriant, N.; Ham, D. A.; Bertolli, C., PyOP2: a high-level framework for performance-portable simulations on unstructured meshes, 2012 SC companion: high performance computing, networking storage and analysis, 1116-1123 (2012), IEEE
[63] Incardona, P.; Leo, A.; Zaluzhnyi, Y.; Ramaswamy, R.; Sbalzarini, I. F., OpenFPM: a scalable open framework for particle and particle-mesh codes on parallel computers, Comput Phys Commun, 241, 155-177 (2019)
[64] http://superfri.org/superfri/article/view/17.
[65] PSyclone Project. 2018. http://psyclone.readthedocs.io/.
[66] Baldauf, M.; Seifert, A.; Förstner, J.; Majewski, D.; Raschendorfer, M.; Reinhardt, T., Operational convective-scale numerical weather prediction with the COSMO model: description and sensitivities, Mon Weather Rev, 139, 12, 3887-3905 (2011)
[67] Clement, V.; Ferrachat, S.; Fuhrer, O.; Lapillonne, X.; Osuna, C. E.; Pincus, R., The CLAW DSL: abstractions for performance portable weather and climate models, Proceedings of the platform for advanced scientific computing conference, PASC ’18 (2018), ACM: ACM New York, NY, USA, 2:1-2:10
[68] Clément, V.; Marti, P.; Fuhrer, O.; Sawyer, W., Performance portability on GPU and CPU with the ICON global climate model, EGU general assembly conference abstracts. EGU general assembly conference abstracts, EGU General Assembly Conference Abstracts, 20, 13435 (2018)
[69] Alnæs, M. S.; Blechta, J.; Hake, J.; Johansson, A.; Kehlet, B.; Logg, A., The FEniCS project version 1.5, Arch Numer Softw, 3, 100 (2015)
[70] Rathgeber, F.; Ham, D. A.; Mitchell, L.; Lange, M.; Luporini, F.; Mcrae, A. T.T., Firedrake: automating the finite element method by composing abstractions, ACM Trans Math Softw, 43, 3 (2016), 24:1-24:27 · Zbl 1396.65144
[71] Lengauer, C.; Apel, S.; Bolten, M.; Größlinger, A.; Hannig, F.; Köstler, H., ExaStencils: advanced stencil-code engineering, (Lopes, L.; Žilinskas, J.; Costan, A.; Cascella, R. G.; Kecskemeti, G.; Jeannot, E.; etal., Euro-par 2014: parallel processing workshops (2014), Springer International Publishing: Springer International Publishing Cham), 553-564
[72] Macià, S.; Mateo, S.; Martínez-Ferrer, P. J.; Beltran, V.; Mira, D.; Ayguadé, E., Saiph: towards a DSL for high-performance computational fluid dynamics, Proceedings of the real world domain specific languages workshop 2018, RWDSL2018 (2018), ACM: ACM New York, NY, USA, 6:1-6:10
[73] Rink, N. A.; Huismann, I.; Susungi, A.; Castrillon, J.; Stiller, J.; Fröhlich, J., Cfdlang: high-level code generation for high-order methods in fluid dynamics, Proceedings of the real world domain specific languages workshop 2018, RWDSL2018 (2018), ACM: ACM New York, NY, USA, 5:1-5:10
[74] Bastian, P.; Blatt, M.; Dedner, A.; Engwer, C.; Klöfkorn, R.; Ohlberger, M., A generic grid interface for parallel and adaptive scientific computing. Part I: abstract framework, Computing, 82, 2, 103-119 (2008) · Zbl 1151.65089
[75] Lusher, D. J.; Jammy, S. P.; Sandham, N. D., Shock-wave/boundary-layer interactions in the automatic source-code generation framework OpenSBLI, Comput Fluids, 173, 17-21 (2018) · Zbl 1410.76299
[76] Lange, M.; Kukreja, N.; Louboutin, M.; Luporini, F.; Vieira, F.; Pandolfo, V., Devito: towards a generic finite difference DSL using symbolic python, 2016 6th workshop on python for high-performance and scientific computing (PyHPC), 67-75 (2016)
[77] Williams, S.; Waterman, A.; Patterson, D., Roofline: an insightful visual performance model for floating-point programs and multicore architectures, Tech. Rep (2009), Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States)
[78] Pennycook, S. J.; Sewall, J. D.; Lee, V. W., Implications of a metric for performance portability, Future Gener Comput Syst, 92, 947-958 (2017)
[79] Harrell, S. L.; Kitson, J.; Bird, R.; Pennycook, S. J.; Sewall, J.; Jacobsen, D., Effective performance portability, 2018 IEEE/ACM international workshop on performance, portability and productivity in HPC (P3HPC), 24-36 (2018)
[80] Intel. Code base investigator. https://github.com/intel/code-base-investigator.
[81] McIntosh-Smith, S., Performance portability across diverse computer architectures, P3MA: 4th international workshop on performance portable programming models for manycore or accelerators (2019)
[82] Reguly, I. Z.; Mudalige, G. R.; Giles, M. B., Loop tiling in large-Scale stencil codes at run-time with OPS, IEEE Trans Parallel Distrib Syst, 29, 4, 873-886 (2018)
[83] Siklosi, B.; Reguly, I. Z.; Mudalige, G. R., Heterogeneous CPU-GPU execution of stencil applications, 2018 IEEE/ACM international workshop on performance, portability and productivity in HPC (P3HPC), 71-80 (2018)
[84] Law, T. R.; Kevis, R.; Powell, S.; Dickson, J.; Maheswaran, S.; Herdman, J. A.; Jarvis, S. A., Performance portability of an unstructured hydrodynamics mini-application, 2018 IEEE/ACM international workshop on performance, portability and productivity in HPC (P3HPC), 0-12 (2018)
[85] Kirk, R. O.; Mudalige, G. R.; Reguly, I. Z.; Wright, S. A.; Martineau, M. J.; Jarvis, S. A., Achieving performance portability for a heat conduction solver mini-application on modern multi-core systems, 2017 IEEE international conference on cluster computing (CLUSTER), 834-841 (2017)
[86] Sudheer Chunduri, S. P.; Rahman, R., Nekbone performance portability, The 2017 DOE COE performance portability meeting (2017)
[87] Ferenbaugh, C. R., Coarse vs. fine-level threading in the PENNANT mini-app, The 2016 DOE COE performance portability meeting (2016)
[88] Brunini, V.; Clausen, J.; Hoemmen, M.; Kucala, A.; Phillips, M.; Trott, C., Progress towards a performance-portable SIERRA/ aria, The 2019 DOE performance, portability and productivity annual meeting (2019)
[89] Stan Moore, A. S., Obtaining threading performance portability in SPARTA using Kokkos, The 2019 DOE performance, portability and productivity annual meeting (2019)
[90] Watkins, J.; Tezaur, I.; Demeshko, I., A study on the performance portability of the finite element assembly process within the Albany Land Ice solver, The 2019 DOE performance, portability and productivity annual meeting (2019)
[91] Howard, M., Performance portability in SPARC sandias hypersonic CFD code for next-generation platforms, The 2017 DOE COE performance portability meeting (2017)
[92] Howard, M.; Fisher, T. C.; Hoemmen, M. F.; Dinzl, D. J.; Overfelt, J. R.; Bradley, A. M., Employing multiple levels of parallelism for CFD at large scales on next generation high-performance computing platforms., Tenth international conference on computational fluid dynamics (ICCFD10) (2018)
[93] Holmen, J. K.; Humphrey, A.; Sunderland, D.; Berzins, M., Improving uintah’s scalability through the use of portable Kokkos-based data parallel tasks, Proceedings of the practice and experience in advanced research computing 2017 on sustainability, success and impact, 27 (2017), ACM
[94] Padioleau, T.; Tremblin, P.; Audit, E.; Kestener, P.; Kokh, S., A high-performance and portable all-mach regime flow solver code with well-balanced gravity. application to compressible convection, Astrophys J, 875, 2, 128 (2019)
[95] Prez, F. E.H.; Mukhadiyev, N.; Xu, X.; Sow, A.; Lee, B. J.; Sankaran, R., Direct numerical simulations of reacting flows with detailed chemistry using many-core/GPU acceleration, Comput Fluids, 173, 73-79 (2018) · Zbl 1410.76466
[96] David Beckingsale Johann Dahm, P. W., Porting SAMRAI to sierra, The 2019 DOE performance, portability and productivity annual meeting (2019)
[97] Pearce, O., Exploring utilization options of heterogeneous architectures for multi-physics simulations, Parallel Comput, 87, 35-45 (2019)
[98] Mudalige, G.; Reguly, I.; Jammy, S.; Jacobs, C.; Giles, M.; Sandham, N., Large-scale performance of a DSL-based multi-block structured-mesh application for direct numerical simulation, J Parallel Distrib Comput, 131, 130-146 (2019)
[99] Reguly, I. Z.; Giles, D.; Gopinathan, D.; Quivy, L.; Beck, J. H.; Giles, M. B., The VOLNA-OP2 tsunami code (version 1.5), Geosci Model Dev, 11, 11, 4621-4635 (2018)
[100] Reguly, I. Z.; Mudalige, G. R.; Bertolli, C.; Giles, M. B.; Betts, A.; Kelly, P. H.J., Acceleration of a full-Scale industrial CFD application with OP2, IEEE Trans Parallel Distrib Syst, 27, 5, 1265-1278 (2016)
[101] Mudalige, G. R.; Reguly, I. Z.; Giles, M. B., Auto-vectorizing a large-scale production unstructured-mesh CFD application, Proceedings of the 3rd workshop on programming models for SIMD/vector processing, WPMVP ’16 (2016), ACM: ACM New York, NY, USA, 5:1-5:8
[102] Truby, D.; Wright, S.; Kevis, R.; Maheswaran, S.; Herdman, A.; Jarvis, S., BookLeaf: an unstructured hydrodynamics mini-application, 2018 IEEE international conference on cluster computing (CLUSTER), 615-622 (2018)
[103] TeaLeaf: UK Mini-App Consortium. 2015. https://github.com/UK-MAC/TeaLeaf.
[104] Daley, C., Evaluation of OpenMP performance on GPUs through micro-benchmarks, The 2019 DOE performance, portability and productivity annual meeting (2019)
[105] OP-DSL: The Oxford Parallel Domain Specific Languages. 2015. https://op-dsl.github.io.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.