Divide-and-conquer checkpointing for arbitrary programs with no user annotation. (English) Zbl 1455.65037

Optim. Methods Softw. 33, No. 4-6, 1288-1330 (2018); correction ibid. 36, No. 6, 1317-1318 (2021).
Summary: Classical reverse-mode automatic differentiation (AD) imposes only a small constant-factor overhead in operation count over the original computation, but has storage requirements that grow, in the worst case, in proportion to the time consumed by the original computation. This storage blowup can be ameliorated by checkpointing, a process that reorders application of classical reverse-mode AD over an execution interval to tradeoff space vs. time. Application of checkpointing in a divide-and-conquer fashion to strategically chosen nested execution intervals can break classical reverse-mode AD into stages which can reduce the worst-case growth in storage from linear to sublinear. Doing this has been fully automated only for computations of particularly simple form, with checkpoints spanning execution intervals resulting from a limited set of program constructs. Here we show how the technique can be automated for arbitrary computations. The essential innovation is to apply the technique at the level of the language implementation itself, thus allowing checkpoints to span any execution interval.


65D25 Numerical differentiation
68N18 Functional programming and lambda calculus


Full Text: DOI arXiv


[1] High level specification of I/O in functional languages, Functional Programming, Glasgow 1992, Springer, Ayr Scotland1993117
[2] Second order stochastic optimization in linear time, 2016. Available at
[3] Learning to learn by gradient descent by gradient descent, in Neural Information Processing Systems, NIPS, 2016. Available at · Zbl 1001.68724
[4] Appel, A. W., Compiling with Continuations, (2006), Cambridge University Press, Cambridge, UK
[5] FADBAD, a flexible C++ package for automatic differentiation, Technical Report IMM–REP–1996–17, Department of Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark, 1996
[6] Training deep nets with sublinear memory cost, 2016. Available at
[7] Reverse accumulation of functions containing gradients, Tech. Rep. 278, University of Hertfordshire Numerical Optimisation Centre, presented at the Theory Institute Argonne National Laboratory Illinois, Procs. of the Theory Institute on Combinatorial Challenges in Automatic Differentiation, 1993. Available at
[8] The data-flow equations of checkpointing in reverse automatic differentiation, in Computational Science – ICCS 2006, Lecture Notes in Computer Science, Vol. 3994, V.N. Alexandrov, G.D. van Albada, P.M.A. Sloot, and J.J. Dongarra, eds., Springer, Heidelberg, 2006, pp. 566–573
[9] Griewank, A., achieving logarithmic growth of temporal and spatial complexity in reverse automatic differentiation, Optim. Methods Softw., 1, 35-54, (1992)
[10] A package for the automatic differentiation of algorithms written in C/C++. User manual, Tech. Rep., Institute of Scientific Computing, Technical University of Dresden, Dresden, Germany, 1996. Available at
[11] Memory-efficient backpropagation through time, in Neural Information Processing Systems, NIPS, 2016. Available at
[12] TAPENADE 2.1 user’s guide, Rapport technique 300, INRIA, 2004
[13] Engines build process abstractions, Proceedings of the 1984 ACM Symposium on LISP and Functional Programming, ACM, Austin, TX, 1984, pp. 18–24
[14] Checkpointing without operating system intervention: Implementing Griewank’s algorithm, Master’s thesis, Ohio University, 1998
[15] Implementation of forward and reverse mode automatic differentiation for GNU Octave applications, Master’s thesis, Ohio University, 2003
[16] A correspondence between continuation passing style and static single assignment form, ACM SIGPLAN Notices, Papers from the 1995 ACM SIGPLAN Workshop on Intermediate Representations, vol. 30, 1995, pp. 13–22
[17] Optimal checkpointing for time-stepping procedures in ADOL-C, in Computational Science – ICCS 2006, Lecture Notes in Computer Science, Vol. 3994, V.N. Alexandrov, G.D. van Albada, P.M.A. Sloot, and J.J. Dongarra, eds., Springer, Heidelberg, 2006, pp. 566–573
[18] Reynolds, J. C., the discoveries of continuations, Lisp Symb. Comput., 6, 233-247, (1993)
[19] Siskind, J. M.; Pearlmutter, B. A., nesting forward-mode AD in a functional framework, Higher-Order Symb. Comput., 21, 361-376, (2008) · Zbl 1175.68104
[20] Binomial checkpointing for arbitrary programs with no user annotation, 2016. Available at
[21] Efficient implementation of a higher-order language with built-in AD, 2016. Available at
[22] Compiling fast partial derivatives of functions given by algorithms, Ph.D. diss., Department of Computer Science, University of Illinois at Urbana-Champaign, 1980
[23] A tool for creating high-speed, memory efficient derivative codes for large scale applications, Master’s thesis, Ohio University, 2000
[24] Sussman, G. J.; Wisdom, J.; Mayer, M. E., Structure and Interpretation of Classical Mechanics, (2001), MIT Press, Cambridge, MA
[25] Volin, Y. M.; Ostrovskii, G. M., automatic computation of derivatives with the use of the multilevel differentiating technique — I: algorithmic basis, Comput. Math. Appl., 11, 1099-1114, (1985) · Zbl 0595.65019
[26] Comprehending monads, Proceedings of the 1990 ACM Conference on Lisp and Functional Programming, ACM, Nice, France, 1990, pp. 61–78
[27] Whole-program compilation in MLton, Workshop on ML, 2006. Available at
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.