##
**An efficient time-step-based self-adaptive algorithm for predictor-corrector methods of Runge-Kutta type.**
*(English)*
Zbl 1228.65112

Summary: Finding an efficient implementation variant for the numerical solution of problems from computational science and engineering involves many implementation decisions that are strongly influenced by the specific hardware architecture. The complexity of these architectures makes it difficult to find the best implementation variant by manual tuning. For numerical solution methods from linear algebra, auto-tuning techniques based on a global search engine as they are used for ATLAS or FFTW can be used successfully. These techniques generate different implementation variants at installation time and select one of these implementation variants either at installation time or at runtime, before the computation starts. For some numerical methods, auto-tuning at installation time cannot be applied directly, since the best implementation variant may strongly depend on the specific numerical problem to be solved.

An example is solution methods for initial value problems (IVPs) of ordinary differential equations (ODEs), where the coupling structure of the ODE system to be solved has a large influence on the efficient use of the memory hierarchy of the hardware architecture. In this context, it is important to use auto-tuning techniques at runtime, which is possible because of the time-stepping nature of ODE solvers.

In this article, we present a sequential self-adaptive ODE solver that selects the best implementation variant from a candidate pool at runtime during the first time steps, i.e., the auto-tuning phase already contributes to the progress of the computation. The implementation variants differ in the loop structure and the data structures used to realize the numerical algorithm, a predictor-corrector (PC) iteration scheme with Runge-Kutta (RK) corrector considered here as an example. For those implementation variants in the candidate pool that use loop tiling to exploit the memory hierarchy of a given hardware platform we investigate the selection of tile sizes. The self-adaptive ODE solver combines empirical search with a model-based approach in order to reduce the search space of possible tile sizes. Runtime experiments demonstrate the efficiency of the self-adaptive solver for different IVPs across a range of problem sizes and on different hardware architectures.

An example is solution methods for initial value problems (IVPs) of ordinary differential equations (ODEs), where the coupling structure of the ODE system to be solved has a large influence on the efficient use of the memory hierarchy of the hardware architecture. In this context, it is important to use auto-tuning techniques at runtime, which is possible because of the time-stepping nature of ODE solvers.

In this article, we present a sequential self-adaptive ODE solver that selects the best implementation variant from a candidate pool at runtime during the first time steps, i.e., the auto-tuning phase already contributes to the progress of the computation. The implementation variants differ in the loop structure and the data structures used to realize the numerical algorithm, a predictor-corrector (PC) iteration scheme with Runge-Kutta (RK) corrector considered here as an example. For those implementation variants in the candidate pool that use loop tiling to exploit the memory hierarchy of a given hardware platform we investigate the selection of tile sizes. The self-adaptive ODE solver combines empirical search with a model-based approach in order to reduce the search space of possible tile sizes. Runtime experiments demonstrate the efficiency of the self-adaptive solver for different IVPs across a range of problem sizes and on different hardware architectures.

### MSC:

65L06 | Multistep, Runge-Kutta and extrapolation methods for ordinary differential equations |

65L05 | Numerical methods for initial value problems involving ordinary differential equations |

34A34 | Nonlinear ordinary differential equations and systems |

65Y05 | Parallel numerical computation |

65Y15 | Packaged methods for numerical algorithms |

### Keywords:

auto-tuning; ordinary differential equations; Runge-Kutta methods; tile size selection; locality; numerical examples; initial value problems; algorithm; predictor-corrector iteration
PDF
BibTeX
XML
Cite

\textit{N. Kalinnik} et al., J. Comput. Appl. Math. 236, No. 3, 394--410 (2011; Zbl 1228.65112)

Full Text:
DOI

### References:

[1] | Aho, A.V.; Lam, M.S.; Sethi, R.; Ullman, J.D., Compilers: principles, techniques, and tools, (2007), Pearson Education |

[2] | Allen, R.; Kennedy, K., Optimizing compilers for modern architectures: A dependence based approach, (2002), Morgan Kaufmann |

[3] | van der Houwen, P.J.; Sommeijer, B.P., Parallel iteration of high-order runge – kutta methods with stepsize control, Journal of computational and applied mathematics, 29, 111-127, (1990) · Zbl 0682.65039 |

[4] | Korch, M.; Rauber, T., Locality optimized shared-memory implementations of iterated runge – kutta methods, (), 737-747 |

[5] | Hartono, A.; Baskaran, M.; Ramanujam, J.; Sadayappan, P., Dyntile: parametric tiled loop generation for parallel execution on multicore processors, () |

[6] | R.C. Whaley, J.J. Dongarra, Automatically tuned linear algebra software, Tech. Rep. UT-CS-97-366, University of Tennessee, 1997. |

[7] | J. Bilmes, K. Asanovic, C.W. Chin, J. Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performancee, ANSI C coding methodology, in: 11th ACM International Conference on Supercomputing, ICS’97, 1997. |

[8] | Tiwari, A.; Chen, C.; Chame, J.; Hall, M.; Hollingsworth, J.K., A scalable auto-tuning framework for compiler optimization, () |

[9] | Zhao, J.; Horsnell, M.; Luján, M.; Rogers, I.; Kirkham, C.; Watson, I., Adaptive loop tiling for a multi-cluster CMP, (), 220-232 |

[10] | Frigo, M.; Johnson, S.G., The design and implementation of FFTW3, Proceedings of the IEEE, 216-231, (2005) |

[11] | Püschel, M.; Moura, J.M.F.; Johnson, J.; Padua, D.; Veloso, M.; Singer, B., SPIRAL: code generation for DSP transforms, Proceedings of the IEEE, 93, 2, 232-275, (2005), (special issue on “Program Generation, Optimization, and Adaptation”) |

[12] | Yotov, K.; Li, X.; Ren, G.; Garzaran, M.; Padua, D.; Pingali, K., Is search really necessary to generate high-performancee BLAS?, Proceedings of the IEEE, 93, 2, 358-386, (2005) |

[13] | L.N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, Combined iterative and model-driven optimization in an automatic parallelization framework, in: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC10, New Orleans, LA, 2010. · Zbl 1284.68094 |

[14] | Rahman, M.; Pouchet, L.N.; Sadayappan, P., Neural network assisted tile size selection, () |

[15] | Eijkhout, V.; Fuentes, E., Machine learning for multi-stage selection of numerical methods, (), 117-136, (Chapter) |

[16] | Hairer, E.; Nørsett, S.P.; Wanner, G., Solving ordinary differential equations I: nonstiff problems, (2000), Springer Berlin |

[17] | Nørsett, S.P.; Simonsen, H.H., Aspects of parallel runge – kutta methods, (), 103-117 · Zbl 0683.65057 |

[18] | Ehrig, R.; Nowak, U.; Deuflhard, P., Massively parallel linearly-implicit extrapolation algorithms as a powerful tool in process simulation, (), 517-524 · Zbl 0923.68064 |

[19] | Kappeller, M.; Kiehl, M.; Perzl, M.; Lenke, M., Optimized extrapolation methods for parallel solution of IVPs on different computer architectures, Applied mathematics and computation, 77, 2-3, 301-315, (1996) · Zbl 0859.65070 |

[20] | Burrage, K., Parallel and sequential methods for ordinary differential equations, (1995), Oxford University Press New York · Zbl 0838.65073 |

[21] | Schmitt, B.A.; Weiner, R.; Jebens, S., Parameter optimization for explicit parallel peer two-step methods, Applied numerical mathematics, 59, 769-782, (2008) · Zbl 1163.65051 |

[22] | Hairer, E.; Wanner, G., Solving ordinary differential equations II: stiff and differential-algebraic problems, (2002), Springer Berlin |

[23] | Martí, R.; Campos, V.; Piñana, E., A branch and bound algorithm for the matrix bandwidth minimization, European journal of operational research, 186, 2, 513-528, (2008) · Zbl 1138.90037 |

[24] | Wang, Q.; Guo, Y.C.; Shi, X.W., An improved algorithm for matrix bandwidth and profile reduction in finite element analysis, Progress in electromagnetics research letters, 9, 29-38, (2009) |

[25] | Performance Application Programming Interface, PAPI homepage. http://icl.cs.utk.edu/papi/ (last checked: 05/2011). |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.