FFT, FMM, or multigrid? A comparative study of state-of-the-art Poisson solvers for uniform and nonuniform grids in the unit cube. (English) Zbl 1369.65138


65N22 Numerical solution of discretized equations for boundary value problems involving PDEs
65N55 Multigrid methods; domain decomposition for boundary value problems involving PDEs
65T40 Numerical methods for trigonometric approximation and interpolation
65T50 Numerical methods for discrete and fast Fourier transforms
65Y05 Parallel numerical computation
78M16 Multipole methods applied to problems in optics and electromagnetic theory
Full Text: DOI arXiv


[1] M. Adams, J. Brown, J. Shalf, B. V. Straalen, E. Strohmaier, R. Vuduc, and S. Williams, High-performance geometric multigrid, https://hpgmg.org, 2014.
[2] A. Arnold, F. Fahrenberger, C. Holm, O. Lenz, M. Bolten, H. Dachsel, R. Halver, I. Kabadshow, F. Gähler, F. Heber, et al., Comparison of scalable fast methods for long-range interactions, Phys. Rev. E, 88 (2013), 063308.
[3] S. Balay, M. F. Adams, J. Brown, P. Brune, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, K. Rupp, B. F. Smith, and H. Zhang, PETSc Web page, \burlhttp://www.mcs.anl.gov/petsc, 2014.
[4] W. Bangerth, R. Hartmann, and G. Kanschat, deal.II—A general purpose object oriented finite element library, ACM Trans. Math. Softw., 33 (2007), 24. · Zbl 1365.65248
[5] A. Bhatele, P. Jetley, H. Gahvari, L. Wesolowski, W. D. Gropp, and L. Kale, Architectural constraints to attain 1 exaflop/s for three scientific application classes, in 2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, Piscataway, NJ, 2011, pp. 80–91.
[6] A. Brandt, Multi-level adaptive solutions to boundary-value problems, Math. Comp., 31 (1977), pp. 333–390. · Zbl 0373.65054
[7] C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, and L. C. Wilcox, Extreme-scale AMR, in SC10: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM/IEEE, IEEE Computer Society, Washington, DC, 2010.
[8] C. Burstedde, L. C. Wilcox, and O. Ghattas, p4est: Scalable algorithms for parallel adaptive mesh refinement on forests of octrees, SIAM J. Sci. Comput., 33 (2011), pp. 1103–1133, http://dx.doi.org/10.1137/100791634 doi:10.1137/100791634. · Zbl 1230.65106
[9] J. Carrier, L. Greengard, and V. Rokhlin, A fast adaptive multipole algorithm for particle simulations, SIAM J. Sci. Statist. Comput., 9 (1988), pp. 669–686, doi:10.1137/0909044. · Zbl 0656.65004
[10] H. Cheng, L. Greengard, and V. Rokhlin, A fast adaptive multipole algorithm in three dimensions, J. Comput. Phys., 155 (1999), pp. 468–498. · Zbl 0937.65126
[11] K. Czechowski, C. Battaglino, C. McClanahan, K. Iyer, P.-K. Yeung, and R. Vuduc, On the communication complexity of 3D FFTs and its implications for exascale, in Proceedings of the 26th ACM International Conference on Supercomputing, ACM, New York, 2012, pp. 205–214.
[12] M. O. Deville, P. F. Fischer, and E. H. Mund, High-Order Methods for Incompressible Fluid Flow, Cambridge Monogr. Appl. Comput. Math. 9, Cambridge University Press, Cambridge, UK, 2002. · Zbl 1007.76001
[13] R. O. Dror, J. Grossman, K. M. Mackenzie, B. Towles, E. Chow, J. K. Salmon, C. Young, J. A. Bank, B. Batson, M. M. Deneroff, et al., Exploiting \(162\)-nanosecond end-to-end communication latency on anton, in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Computer Society, Washington, DC, 2010, pp. 1–12.
[14] F. Ethridge and L. Greengard, A new fast-multipole accelerated poisson solver in two dimensions, SIAM J. Sci. Comput., 23 (2001), pp. 741–760, doi:10.1137/S1064827500369967. · Zbl 1002.65131
[15] S. Filippone, The IBM parallel engineering and scientific subroutine library, in Applied Parallel Computing Computations in Physics, Chemistry and Engineering Science, Springer, Berlin, Heidelberg, 1996, pp. 199–206.
[16] I. T. Foster and P. H. Worley, Parallel algorithms for the spectral transform method, SIAM J. Sci. Comput., 18 (1997), pp. 806–837, doi:10.1137/S1064827594266891. · Zbl 0872.65094
[17] M. Frigo and S. G. Johnson, The design and implementation of fftw3, Proceedings of the IEEE, 93 (2005), pp. 216–231.
[18] H. Gahvari, A. H. Baker, M. Schulz, U. M. Yang, K. E. Jordan, and W. Gropp, Modeling the performance of an algebraic multigrid cycle on HPC platforms, in Proceedings of the International Conference on Supercomputing, ACM, New York, 2011, pp. 172–181.
[19] H. Gahvari and W. Gropp, An introductory exascale feasibility study for FFTs and multigrid, in 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), IEEE Computer Society, Washington, DC, 2010, pp. 1–9.
[20] M. W. Gee, C. M. Siefert, J. J. Hu, R. S. Tuminaro, and M. G. Sala, ML \(5.0\) smoothed aggregation user’s guide, Tech. Report SAND2006-2649, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, 2006.
[21] A. Gholami, J. Hill, D. Malhotra, and G. Biros, AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures, SIAM J. Sci. Comput., submitted, http://arxiv.org/abs/1506.07933.
[22] A. Grama, A. Gupta, G. Karypis, and V. Kumar, An Introduction to Parallel Computing: Design and Analysis of Algorithms, 2nd ed., Addison Wesley, Reading, MA, 2003. · Zbl 0861.68040
[23] W. Hackbusch, Multigrid Methods and Applications, Springer Ser. Comput. Math. 4, Springer-Verlag, Berlin, 1985.
[24] M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, et al., An overview of the trilinos project, ACM Trans. Math. Software, 31 (2005), pp. 397–423. · Zbl 1136.65354
[25] T. Isaac, C. Burstedde, and O. Ghattas, Low-cost parallel algorithms for 2:1 octree balance, in IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS), IEEE Computer Society, Washington, DC, 2012, pp. 426–437.
[26] M. H. Langston, L. Greengard, and D. Zorin, A free-space adaptive FMM-based PDE solver in three dimensions, Commun. Appl. Math. Comput. Sci., 6 (2011), pp. 79–122. · Zbl 1230.65131
[27] I. Lashuk, A. Chandramowlishwaran, H. Langston, T.-A. Nguyen, R. Sampath, A. Shringarpure, R. Vuduc, L. Ying, D. Zorin, and G. Biros, A massively parallel adaptive fast multipole method on heterogeneous architectures, Comm. ACM, 55 (2012), pp. 101–109.
[28] N. Li and S. Laizet, 2DECOMP & FFT-a highly scalable 2d decomposition library and FFT interface, in Cray User Group 2010 Conference, 2010, pp. 1–13, https://cug.org/5-publications/proceedings_attendee_lists/CUG10CD/.
[29] X. S. Li and J. W. Demmel, SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Math. Software, 29 (2003), pp. 110–140. · Zbl 1068.90591
[30] J. W. Lottes and P. F. Fischer, Hybrid multigrid/Schwarz algorithms for the spectral element method, J. Sci. Comput., 24 (2005), pp. 45–78. · Zbl 1078.65570
[31] D. Malhotra and G. Biros, \textitpvfmm: A distributed memory fast multipole method for volume potentials, ACM Trans. Math. Software, to appear, padas.ices.utexas.edu/static/papers/pvfmm.pdf. · Zbl 1371.65125
[32] P. McCorquodale, P. Colella, G. T. Balls, and S. B. Baden, A local corrections algorithm for solving Poisson’s equation in three dimensions, Commun. Appl. Math. Comput. Sci., 2 (2006), pp. 57–81. · Zbl 1133.65106
[33] A. Nukada, K. Sato, and S. Matsuoka, Scalable multi-GPU 3-d FFT for TSUBAME 2.0 supercomputer, in 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), IEEE Computer Society, Washington, DC, 2012, pp. 1–10.
[34] D. Pekurovsky, P3DFFT: A framework for parallel computations of Fourier transforms in three dimensions, SIAM J. Sci. Comput., 34 (2012), pp. C192–C209, doi:10.1137/11082748X. · Zbl 1253.65205
[35] J. C. Phillips, G. Zheng, S. Kumar, and L. V. Kalé, NAMD: Biomolecular simulation on thousands of processors, in Proceedings of Supercomputing, The SCxy Conference series, Baltimore, MD, 2002, ACM/IEEE, IEEE Computer Society, Washington, DC, 2002.
[36] J. R. Phillips and J. K. White, A precorrected-FFT method for electorstatic analysis of complicated 3-D structures, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., 16 (1997), pp. 1059–1072.
[37] M. Pippig, PFFT: An extension of FFTW to massively parallel architectures, SIAM J. Sci. Comput., 35 (2013), pp. C213–C236, doi:10.1137/120885887. · Zbl 1275.65098
[38] R. S. Sampath and G. Biros, A parallel geometric multigrid method for finite elements on octree meshes, SIAM J. Sci. Comput., 32 (2010), pp. 1361–1392, doi:10.1137/090747774. · Zbl 1213.65144
[39] I. Sbalzarini, J. Walther, M. Bergdorf, S. Hieber, E. Kotsalis, and P. Koumoutsakos, PPM–A highly efficient parallel particle–mesh library for the simulation of continuum systems, J. Comput. Phys., 215 (2006), pp. 566–588. · Zbl 1173.76398
[40] H. Sundar, G. Biros, C. Burstedde, J. Rudi, O. Ghattas, and G. Stadler, Parallel geometric-algebraic multigrid on unstructured forests of octrees, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, IEEE Computer Society Press, Los Alamitos, CA, 2012, 43.
[41] H. Sundar, R. S. Sampath, and G. Biros, Bottom-up construction and 2:1 balance refinement of linear octrees in parallel, SIAM J. Sci. Comput., 30 (2008), pp. 2675–2708, doi:10.1137/070681727. · Zbl 1186.68554
[42] H. Sundar, G. Stadler, and G. Biros, Comparison of multigrid algorithms for high-order continuous finite element discretizations, Numer. Linear Algebra Appl., 22 (2015), pp. 664–680. · Zbl 1349.65680
[43] P. N. Swarztrauber, Multiprocessor FFTs, Parallel Computi., 5 (1987), pp. 197–210.
[44] R. A. Sweet, W. L. Briggs, S. Oliveira, J. L. Porsche, and T. Turnbull, FFTs and three-dimensional poisson solvers for hypercubes, Parallel Comput., 17 (1991), pp. 121–131. · Zbl 0742.65074
[45] O. Tatebe and Y. Oyanagi, Efficient implementation of the multigrid preconditioned conjugate gradient method on distributed memory machines, in Proceedings of Supercomputing ’94, IEEE Computer Society, Washington, DC, 1994, pp. 194–203. · Zbl 0809.65115
[46] U. Trottenberg, C. W. Oosterlee, and A. Schuller, Multigrid, Academic Press, San Diego, CA, 2000.
[47] R. S. Tuminaro, E. G. Boman, J. J. Hu, A. Prokopenko, C. Siefert, P. H. Tsuji, J. Gaidamour, L. Olson, J. Schroder, B. Hiriyur, et al., Toward flexible scalable algebraic multigrid solvers., Tech. report, Sandia National Laboratories Albuquerque, NM, Livermore, CA, 2013.
[48] E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu, Q. Wu, and Y. Wang, Intel math kernel library, in High-Performance Computing on the Intel Xeon Phi, Springer, Cham, Switzerland, 2014, pp. 167–188.
[49] L. Ying, G. Biros, and D. Zorin, A kernel-independent adaptive fast multipole method in two and three dimensions, J. Comput. Phys., 196 (2004), pp. 591–626. · Zbl 1053.65095
[50] R. Yokota and L. A. Barba, A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems, Int. J. High Perform. Comput. Appl., 26 (2012), pp. 337–346.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.