×

Higher order automatic differentiation of higher order functions. (English) Zbl 07566054

Summary: We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Throughout, we show how the analysis extends to AD methods for computing higher order derivatives using a Taylor approximation.

MSC:

03B70 Logic in computer science
68-XX Computer science
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Shun-ichi Amari.Differential-geometrical methods in statistics, volume 28. Springer Science & Business Media, 2012. · Zbl 0701.62008
[2] Mart´ın Abadi and Gordon D Plotkin. A simple differentiable programming language. In Proc. POPL 2020. ACM, 2020.
[3] Gilles Barthe, Rapha¨elle Crubill´e, Ugo Dal Lago, and Francesco Gavazzo. On the versatility of open logical relations: Continuity, automatic differentiation, and a containment theorem. In · Zbl 1508.68055
[4] Michael Betancourt. A geometric theory of higher-order automatic differentiation.arXiv preprint arXiv:1812.11592, 2018.
[5] John Baez and Alexander Hoffnung. Convenient categories of smooth spaces.Transactions of the American Mathematical Society, 363(11):5789-5825, 2011. · Zbl 1237.58006
[6] Jesse Bettencourt, Matthew J Johnson, and David Duvenaud. Taylor-mode automatic differentiation for higher-order derivatives in JAX. 2019.
[7] Alois Brunel, Damiano Mazza, and Michele Pagani. Backpropagation in the simply typed lambda-calculus with linear negation. InProc. POPL 2020, 2020.
[8] Claus Bendtsen and Ole Stauning. Fadbad, a flexible C++ package for automatic differentiation. Technical report, Technical Report IMM-REP-1996-17, Department of Mathematical Modelling, Technical University of Denmark, Lyngby, 1996.
[9] Claus Bendtsen and Ole Stauning. Tadiff, a flexible c++ package for automatic differentiation. TU of Denmark, Department of Mathematical Modelling, Lungby. Technical report IMM-REP
[10] Geoff Cruttwell, Jonathan Gallagher, and Ben MacAdam. Towards formalizing and extending differential programming using tangent categories. InProc. ACT 2019, 2019.
[11] Curtis Chin Jen Sem. Formalized correctness proofs of automatic differentiation in Coq.Master’s Thesis, Utrecht University, 2020. Thesis: https://dspace.library.uu.nl/handle/1874/400790. Coq
[12] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, pages 6571-6583,
[13] G Constantine and T Savits. A multivariate Faa di Bruno formula with applications.Transactions of the American Mathematical Society, 348(2):503-520, 1996. · Zbl 0846.05003
[14] J Robin B Cockett and Robert AG Seely. The Faa di Bruno construction.Theory and Applications of Categories, 25(15):394-425, 2011. · Zbl 1275.18016
[15] J Daniel Christensen and Enxin Wu. Tangent spaces and tangent bundles for diffeological spaces. arXiv preprint arXiv:1411.5425, 2014. · Zbl 1355.57023
[16] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization.Journal of Machine Learning Research, 12(Jul):2121-2159, 2011. · Zbl 1280.68164
[17] Conal Elliott. The simple essence of automatic differentiation.Proceedings of the ACM on Programming Languages, 2(ICFP):70, 2018.
[18] L Hern´andez Encinas and J Munoz Masque. A short proof of the generalized Fa‘a di Bruno’s formula.Applied Mathematics Letters, 16(6):975-979, 2003. · Zbl 1041.26003
[19] Thomas Ehrhard and Laurent Regnier. The differential lambda-calculus.Theoretical Computer Science, 309(1-3):1-41, 2003. · Zbl 1070.68020
[20] Roy Frostig, Matthew James Johnson, and Chris Leary. Compiling machine learning programs via high-level tracing.Systems for Machine Learning, 2018.
[21] Brendan Fong, David Spivak, and R´emy Tuy´eras. Backprop as functor: A compositional perspective on supervised learning. In2019 34th Annual ACM/IEEE Symposium on Logic in
[22] Izrail Moiseevitch Gelfand, Richard A Silverman, and Richard A Silverman.Calculus of variations. Courier Corporation, 2000.
[23] Andreas Griewank, Jean Utke, and Andrea Walther. Evaluating higher derivative tensors by forward propagation of univariate taylor series.Mathematics of Computation, 69(231):1117-1130, · Zbl 0952.65028
[24] Matthew D Hoffman and Andrew Gelman. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.Journal of Machine Learning Research, 15(1):1593-1623, · Zbl 1319.60150
[25] Mathieu Huot, Sam Staton, and Matthijs V´ak´ar. Correctness of automatic differentiation via diffeologies and categorical gluing. InFoSSaCS, pages 319-338, 2020. · Zbl 07250945
[26] Mathieu Huot, Sam Staton, and Matthijs V´ak´ar. Correctness of automatic differentiation via diffeologies and categorical gluing. Full version, 2020. arxiv:2001.02209. · Zbl 07250945
[27] Patrick Iglesias-Zemmour.Diffeology. American Mathematical Soc., 2013. · Zbl 1269.53003
[28] Peter T Johnstone, Stephen Lack, and P Sobocinski. Quasitoposes, quasiadhesive categories and Artin glueing. InProc. CALCO 2007, 2007. · Zbl 1214.18003
[29] Bart Jacobs and JMMM Rutten. An introduction to (co)algebras and (co)induction. InAdvanced Topics in Bisimulation and Coinduction, pages 38-99. CUP, 2011.
[30] Jerzy Karczmarczuk. Functional differentiation of computer programs.Higher-Order and Symbolic Computation, 14(1):35-57, 2001. · Zbl 0967.68174
[31] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014.
[32] Dana A Knoll and David E Keyes. Jacobian-free Newton-Krylov methods: a survey of approaches and applications.Journal of Computational Physics, 193(2):357-397, 2004. · Zbl 1036.65045
[33] Andreas Kriegl and Peter W Michor.The convenient setting of global analysis, volume 53. American Mathematical Soc., 1997. · Zbl 0889.58001
[34] Anders Kock.Synthetic differential geometry, volume 333. Cambridge University Press, 2006. · Zbl 1091.51002
[35] Ivan Kol´ar, Jan Slov´ak, and Peter W Michor. Natural operations in differential geometry. 1999. [KTR+17]Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. Automatic differentiation variational inference.The Journal of Machine Learning Research, 18(1):430-474,
[36] John M Lee. Smooth manifolds. InIntroduction to Smooth Manifolds, pages 1-31. Springer, 2013.
[37] S¨oren Laue, Matthias Mitterreiter, and Joachim Giesen. Computing higher order derivatives of matrix and tensor expressions.Advances in Neural Information Processing Systems, 31:2750- · Zbl 1497.68429
[38] S¨oren Laue, Matthias Mitterreiter, and Joachim Giesen. A simple and efficient tensor calculus. InAAAI, pages 4527-4534, 2020.
[39] Dong C Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical programming, 45(1-3):503-528, 1989. · Zbl 0696.90048
[40] Fernando Lucatelli Nunes and Matthijs V´ak´ar. CHAD for expressive total languages.arXiv e-prints, pages arXiv-2110, 2021.
[41] Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. On correctness of automatic differentiation for non-differentiable functions. InAdvances in Neural Information Processing
[42] Oleksandr Manzyuk. A simply typedλ-calculus of forward automatic differentiation. In Proc. MFPS 2012, 2012. · Zbl 1342.68052
[43] James Martens. Deep learning via Hessian-free optimization. InICML, volume 27, pages 735-742, 2010.
[44] Joel Merker. Four explicit formulas for the prolongations of an infinitesimal lie symmetry and multivariate Faa di Bruno formulas.arXiv preprint math/0411650, 2004.
[45] Carol Mak and Luke Ong. A differential-form pullback programming language for higher-order reverse-mode automatic differentiation. arxiv:2002.08241, 2020.
[46] Damiano Mazza and Michele Pagani. Automatic differentiation in PCF.Proc. ACM Program. Lang., 5(POPL):1-27, 2021.doi:10.1145/3434309.
[47] John C Mitchell and Andre Scedrov. Notes on sconing and relators. InInternational Workshop on Computer Science Logic, pages 352-378. Springer, 1992. · Zbl 0795.03100
[48] Radford M Neal. MCMC using Hamiltonian dynamics. InHandbook of Markov Chain Monte Carlo, chapter 5. Chapman & Hall / CRC Press, 2011. · Zbl 1229.65018
[49] Andrew M Pitts. Categorical logic. Technical report, University of Cambridge, Computer Laboratory, 1995. · Zbl 0665.03048
[50] Gordon D Plotkin. Some principles of differential programming languages. Invited talk, POPL 2018, 2018.
[51] Barak A Pearlmutter and Jeffrey Mark Siskind. Lazy multivariate higher-order forward-mode ad.ACM SIGPLAN Notices, 42(1):155-160, 2007. · Zbl 1295.65028
[52] Barak A Pearlmutter and Jeffrey Mark Siskind. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator.ACM Transactions on Programming Languages and · Zbl 1175.68104
[53] Ning Qian. On the momentum term in gradient descent learning algorithms.Neural networks, 12(1):145-151, 1999.
[54] Herbert Robbins and Sutton Monro. A stochastic approximation method.The Annals of Mathematical Statistics, pages 400-407, 1951. · Zbl 0054.05901
[55] Thomas H Savits. Some statistical applications of Faa di Bruno.Journal of Multivariate Analysis, 97(10):2131-2140, 2006. · Zbl 1101.62041
[56] Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, and Simon Peyton Jones. Efficient differentiable programming in a functional array-processing language.Proceedings of the ACM
[57] Benjamin Sherman, Jesse Michel, and Michael Carbin.λS: Computable semantics for differentiable programming with higher-order functions and datatypes.arXiv preprint arXiv:2007.08017,
[58] Jean-Marie Souriau. Groupes diff´erentiels. InDifferential geometrical methods in mathematical physics, pages 91-128. Springer, 1980. · Zbl 0501.58010
[59] Andrew Stacey. Comparative smootheology.Theory Appl. Categ., 25(4):64-117, 2011. [V´ak20]Matthijs V´ak´ar. Denotational correctness of foward-mode automatic differentiation for iteration and recursion.arXiv preprint arXiv:2007.05282, 2020. · Zbl 1220.18013
[60] Bart Van Merri¨enboer, Olivier Breuleux, Arnaud Bergeron, and Pascal Lamblin. Automatic differentiation in ML: Where we are and where we should be going. InAdvances in Neural
[61] Matthijs V´ak´ar and Tom Smeding. CHAD: Combinatory homomorphic automatic differentiation. arXiv preprint arXiv:2103.15776, 2021.
[62] Mu Wang, Assefaw Gebremedhin, and Alex Pothen. Capitalizing on live variables: new algorithms for efficient hessian computation via automatic differentiation.Mathematical Programming · Zbl 1391.90582
[63] Shaopeng Zhu, Shih-Han Hung, Shouvanik Chakrabarti, and Xiaodi Wu. On the principles of differentiable quantum programming languages. InProceedings of the 41st ACM SIGPLAN
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.