zbMATH — the first resource for mathematics

Algorithm 942: Semi-stencil. (English) Zbl 1322.65090
65M06 Finite difference methods for initial value and initial-boundary value problems involving PDEs
65N06 Finite difference methods for boundary value problems involving PDEs
65Y20 Complexity and performance of numerical algorithms
Full Text: DOI
[1] Vicki H. Allan, Reese B. Jones, Randall M. Lee, and Stephen J. Allan. 1995. Software pipelining.ACM Comput. Surv. 27, 3, 367–432. DOI:http://dx.doi.org/10.1145/212094.212131. · doi:10.1145/212094.212131
[2] Jose L. Alonso, Xavier Andrade, Pablo Echenique, Fernando Falceto, Diego Prada-Gracia, and Angel Rubio. 2008. Efficient formalism for large-scale ab initio molecular dynamics based on time-dependent density functional theory.Phys. Rev. Lett. 101, 9, 1–4. DOI:http://dx.doi.org/10.1103/PhysRevLett.101.096403. · doi:10.1103/PhysRevLett.101.096403
[3] ANAG. 2012. Chombo software package for amr applications. Applied Numerical Algorithms Group (ANAG), Lawrence Berkeley National Laboratory, Berkeley, CA. http://seesar.lbl.gov/anag/software.html.
[4] Mauricio Araya-Polo, Felix Rubio, Mauricio Hanzich, Raúl de la Cruz, Jose M. Cela, and Daniele P. Scarpazza. 2008. 3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors.Sci. Program. Cell Process. 17, 1–2, 185–198. http://dl.acm.org/citation.cfm?id=1507443.1507449.
[5] Axel Brandenburg. 2003. Computational aspects of astrophysical mhd and turbulence. InThe Fluid Mechanics of Astrophysics and Geophysics, Vol. 9, Taylor and Francis, London, 269–344. http://arxiv.org/abs/astro-ph/010949. · Zbl 1099.85005
[6] David Callahan, Ken Kennedy, and Allan Porterfield. 1991. Software prefetching. InProceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’91). ACM Press, New York, 40–52. DOI:http://dx.doi.org/10.1145/106972.106979. · doi:10.1145/106972.106979
[7] Alberto Castro, Heiko Appel, Micael Oliveira, Carlo A. Rozzi, Florian Lorenzen, Xavier Andrade, Miguel A. L. Marques, E. K. U. Gross, and Angel Rubio. 2006. Octopus: A tool for the application of time-dependent density functional theory.Physica Status Solidi: Towards Atomistic Mater. Des. 243, 11, 2465–2488.
[8] Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. InProceedings of the ACM/IEEE Conference on Supercomputing (SC’08). IEEE Press, 1–12. DOI:http://dx.doi.org/10.1145/1413370.1413375.
[9] Kaushik Datta, Shoaib Kamil, Samuel Williams, Leonid Oliker, John Shalf, and Katherine Yelick. 2009. Optimization and performance modeling of stencil computations on modern microprocessors.SIAM Rev. 51, 1, 129–159. DOI:http://dx.doi.org/10.1137/070693199. · Zbl 1160.65359 · doi:10.1137/070693199
[10] Kaushik Datta, Samuel Williams, Vasily Volkov, Jonathan Carter, Leonid Oliker, John Shalf, and Katherine Yelick. 2010. Auto-tuning stencil computations on multicore and accelerators. InScientific Computing on Multicore and Accelerators, CRC Press, Boca Raton, FL, 219–253. http://dl.acm.org/citation.cfm?id=1413370.1413375
[11] Raúl de la Cruz and Mauricio Araya-Polo. 2011. Towards a multi-level cache performance model for 3d stencil computation. InProceedings of the International Conference on Computational Science (ICCS’11). Vol. 4, Elsevier, 2146–2155. http://dblp.unitrier.de/db/journals/procedia/procedia4.html#CruzA11.
[12] Raúl de la Cruz, Mauricio Araya-Polo, and Jose Mara Cela. 2009. Introducing the semi-stencil algorithm. InProceedings of the 8th International Conference on Parallel Processing and Applied Mathematics (PPAM’09). Vol. 6067, Springer, 496–506. http://dl.acm.org/citation.cfm?id=1882792.1882852.
[13] Matteo Frigo and Volker Strumpen. 2005. Cache oblivious stencil computations. InProceedings of the 19th ACM International Conference on Supercomputing. ACM Press, New York, 361–366. DOI:http://dx.doi.org/10.1145/1088149.1088197. · Zbl 1183.68721 · doi:10.1145/1088149.1088197
[14] Matteo Frigo and Volker Strumpen. 2006. The cache complexity of multithreaded cache oblivious algorithms. InProceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’06). ACM Press, New York, 271–280. DOI:http://dx.doi.org/10.1145/1148109.1148157. · Zbl 1183.68721 · doi:10.1145/1148109.1148157
[15] C. De Groot-Hedlin. 2008. A finite difference solution to the helmholtz equation in a radially symmetric waveguide: Application to near-source scattering in ocean acoustics.J. Comput. Acoust. 16, 3, 447–464. DOI:http://dx.doi.org/10.1142/s0218396x08003683. · Zbl 1257.76074 · doi:10.1142/S0218396X08003683
[16] Shoaib Kamil, Parry Husbands, Leonid Oliker, John Shalf, and Katherine Yelick. 2005. Impact of modern memory subsystems on cache optimizations for stencil computations. InProceedings of the Workshop on Memory System Performance (MSP’05). ACM Press, New York, 36–43. DOI:http://dx.doi.org/10.1145/1111583.1111589. · doi:10.1145/1111583.1111589
[17] Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John Shalf, and Katherine Yelick. 2006. Implicit and explicit optimizations for stencil computations. InProceedings of the Workshop on Memory System Performance and Correctness (MSPC’06). ACM Press, New York, 51–60. DOI:http://dx.doi.org/10.1145/1178597.1178605. · Zbl 1160.65359 · doi:10.1145/1178597.1178605
[18] Jean Kormann, Pedro Cobo, and Andres Prieto. 2008. Perfectly matched layers for modelling seismic oceanography experiments.J. Sound Vibrat. 317, 1–2, 354–365. DOI:http://dx.doi.org/10.1016/j.jsv.2008.03.024. · doi:10.1016/j.jsv.2008.03.024
[19] Monica D. Lam, Edward E. Rothberg, and Michael E. Wolf. 1991. The cache performance and optimizations of blocked algorithms.SIGOPS Oper. Syst. Rev. 25, 63–74. DOI:http://dx.doi.org/10.1145/106974.106981. · doi:10.1145/106974.106981
[20] Naraig Manjikian and Tarek S. Abdelrahman. 1997. Fusion of loops for parallelism and locality.IEEE Trans. Parallel Distrib. Syst. 8, 2, 193–209. DOI:http://dx.doi.org/10.1109/71.577265. · Zbl 05106730 · doi:10.1109/71.577265
[21] John McCalpin and David Wonnacott. 1999. Time skewing: A value-based approach to optimizing for memory locality. Tech. rep. DCS-TR-379, Department of Computer Science, Rutgers University. http://www.haverford.edu/cmsc/davew/cache-opt/cache-opt.html.
[22] Kathryn S. McKinley, Steve Carr, and Chau-Wen Tseng. 1996. Improving data locality with loop transformations.ACM Trans. Program. Lang. Syst. 18, 4, 424–453. DOI:http://dx.doi.org/10.1145/233561.233564. · Zbl 01936143 · doi:10.1145/233561.233564
[23] George A. McMechan. 1989. A review of seismic acoustic imaging by reverse-time migration.Int. J. Imaging Syst. Technol. 1, 1, 0899–9457. DOI:http://dx.doi.org/10.1002/ima.1850010104. · doi:10.1002/ima.1850010104
[24] Todd Mowry and Anoop Gupta. 1991. Tolerating latency through software-controlled data prefetching.J. Parallel Distrib. Comput. 12, 87–106. DOI:http://dx.doi.org/10.1016/0743-7315(91)90014-Z. · doi:10.1016/0743-7315(91)90014-Z
[25] Philip J. Mucci, Shirley Browne, Christine Deane, and George Ho. 1999. PAPI: A portable interface to hardware performance counters. InProceedings of the Department of Defense HPCMP Users Group Conference. 7–10.
[26] Stephane Operto, Jean Virieux, Patrick Amestoy, Luc Giraud, and Jean-Yves L’Excellent. 2006. 3D frequency-domain finite-difference modeling of acoustic wave propagation using a massively parallel direct solver: A feasibility study.SEG Tech. Program Expanded Abstracts 72, 5, 2265–2269. DOI:http://dx.doi.org/10.1190/1.2369987. · doi:10.1190/1.2369987
[27] Gabriel Rivera and Chau Wen Tseng. 2000. Tiling optimizations for 3d scientific computations. InProceedings of the ACM/IEEE Supercomputing Conference (SC’00). IEEE Computer Society. http://dl.acm.org/citation.cfm?id=370049.370403.
[28] Anne Rogers and Kai Li. 1992. Software support for speculative loads.SIGPLAN Not. 27, 9, 38–50. DOI:http://dx.doi.org/10.1145/143371.143484. · doi:10.1145/143371.143484
[29] Olivier Temam, Christine Fricker, and William Jalby. 1994. Cache interference phenomena. InProceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’94). ACM Press, New York, 261–271. DOI:http://dx.doi.org/10.1145/183018.183047. · doi:10.1145/183018.183047
[30] Jan Treibig and Georg Hager. 2009. Introducing a performance model for bandwidth-limited loop kernels. InProceedings of the 8th International Conference on Parallel Processing and Applied Mathematics (PPAM’09). Vol. 6067, Springer, 615–624. http://dl.acm.org/citation.cfm?id=1882792.1882865.
[31] Jan Treibig, Georg Hager, and Gerhard Wellein. 2010a. Complexities of performance prediction for bandwidth-limited loop kernels on multi-core architectures. InHigh Performance Computing in Science and Engineering, S. Wagner, M. Steinmetz, A. Bode, and M. M. Muller Eds., Springer, 3–12.
[32] Jan Treibig, Georg Hager, and Gerhard Wellein. 2010B. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. InProceedings of the 39th International Conference on Parallel Processing Workshops (ICPPW’10). IEEE Computer Society, 207–216. DOI:http://dx.doi.org/10.1109/ICPPW.2010.38. · doi:10.1109/ICPPW.2010.38
[33] Samuel Webb Williams, Andrew Waterman, and David A. Patterson. 2008. Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Tech. rep. UCB/EECS-2008-134, EECS Department, University of California, Berkeley. http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-134.html.
[34] David Wonnacott. 2000. Time skewing for parallel computers. InProceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing (LCPC’99). Springer, 477–480. http://portal.acm.org/citation.cfm?id=645677.663799.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.