Partial differential equation regularization for supervised machine learning. (English) Zbl 1478.65141

Brenner, Susanne C. (ed.) et al., 75 years of mathematics of computation. Symposium celebrating 75 years of mathematics of computation, Institute for Computational and Experimental Research in Mathematics, ICERM, Providence, RI, USA, November 1–3, 2018. Providence, RI: American Mathematical Society (AMS). Contemp. Math. 754, 177-195 (2020).
Summary: This article is an overview of supervised machine learning problems for regression and classification. Topics include: kernel methods, training by stochastic gradient descent, deep learning architecture, losses for classification, statistical learning theory, and dimension independent generalization bounds. Implicit regularization in deep learning examples are presented, including data augmentation, adversarial training, and additive noise. These methods are reframed as explicit gradient regularization.
For the entire collection see [Zbl 1461.11002].


65N99 Numerical methods for partial differential equations, boundary value problems
35A15 Variational methods applied to PDEs
35B65 Smoothness and regularity of solutions to PDEs
65C20 Probabilistic models, generic numerical methods in probability and statistics
68T07 Artificial neural networks and deep learning
68Q32 Computational learning theory


mixup; PRMLT
Full Text: DOI arXiv


[1] Martin Arjovsky, Soumith Chintala, and Leon Bottou, Wasserstein gan, 2017.
[2] Aubert, Gilles; Kornprobst, Pierre, Mathematical problems in image processing, Applied Mathematical Sciences 147, xxxii+377 pp. (2006), Springer, New York · Zbl 1110.35001
[3] Chris M Bishop, Training with noise is equivalent to tikhonov regularization, Neural computation 7 (1995), no. 1, 108-116.
[4] Bishop, Christopher M., Pattern recognition and machine learning, Information Science and Statistics, xx+738 pp. (2006), Springer, New York · Zbl 1107.68072
[5] Bottou, L\'{e}on; Curtis, Frank E.; Nocedal, Jorge, Optimization methods for large-scale machine learning, SIAM Rev., 60, 2, 223-311 (2018) · Zbl 1397.65085
[6] Bousquet, Olivier; Elisseeff, Andr\'{e}, Stability and generalization, J. Mach. Learn. Res., 2, 3, 499-526 (2002) · Zbl 1007.68083
[7] Boyd, Stephen; Vandenberghe, Lieven, Convex optimization, xiv+716 pp. (2004), Cambridge University Press, Cambridge · Zbl 1058.90049
[8] Nicholas Carlini and David A. Wagner, Towards evaluating the robustness of neural networks, 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, 2017, pp. 39-57.
[9] Cheney, E. W., Introduction to approximation theory, xii+259 pp. (1966), McGraw-Hill Book Co., New York-Toronto, Ont.-London · Zbl 0912.41001
[10] Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter, Certified adversarial robustness via randomized smoothing, arXiv preprint arXiv:1902.02918 (2019).
[11] Koby Crammer and Yoram Singer, On the algorithmic implementation of multiclass kernel-based vector machines, Journal of machine learning research 2 (2001), no. Dec, 265-292. · Zbl 1037.68110
[12] Terrance Devries and Graham W. Taylor, Improved regularization of convolutional neural networks with cutout, CoRR abs/1708.04552 (2017).
[13] Harris Drucker and Yann Le Cun, Improving generalization performance using double backpropagation, IEEE Transactions on Neural Networks 3 (1992), no. 6, 991-997.
[14] Chris Finlay, Jeff Calder, Bilal Abbasi, and Adam Oberman, Lipschitz regularized deep neural networks generalize and are adversarially robust, 2018.
[15] Chris Finlay and Adam M Oberman, Scaleable input gradient regularization for adversarial robustness, 2019.
[16] Chris Finlay, Aram-Alexandre Pooladian, and Adam M. Oberman, The logbarrier adversarial attack: making effective use of decision boundary information, 2019.
[17] Yarin Gal and Zoubin Ghahramani, Dropout as a bayesian approximation: Representing model uncertainty in deep learning, international conference on machine learning, 2016, pp. 1050-1059.
[18] Federico Girosi, Michael Jones, and Tomaso Poggio, Regularization theory and neural networks architectures, Neural computation 7 (1995), no. 2, 219-269.
[19] Gabriel Goh, Why momentum really works, Distill (2017).
[20] Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron, Deep learning, Adaptive Computation and Machine Learning, xxii+775 pp. (2016), MIT Press, Cambridge, MA · Zbl 1373.68009
[21] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, Generative adversarial nets, Advances in neural information processing systems, 2014, pp. 2672-2680.
[22] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy, Explaining and harnessing adversarial examples, CoRR abs/1412.6572 (2014).
[23] Moritz Hardt, Benjamin Recht, and Yoram Singer, Train faster, generalize better: Stability of stochastic gradient descent, Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, JMLR.org, 2016, pp. 1225-1234.
[24] Jiri Hron, Alexander G de G Matthews, and Zoubin Ghahramani, Variational gaussian dropout is not bayesian, arXiv preprint arXiv:1711.02989 (2017).
[25] Sergey Ioffe and Christian Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167 (2015).
[26] Maxime Laborde and Adam M. Oberman, A lyapunov analysis for accelerated gradient methods: From deterministic to stochastic case, 2019.
[27] Yann LeCun, Leon Bottou, Yoshua Bengio, Patrick Haffner, et al., Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (1998), no. 11, 2278-2324.
[28] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana, Certified robustness to adversarial examples with differential privacy, arXiv preprint arXiv:1802.03471 (2018).
[29] Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin, Certified adversarial robustness with additive gaussian noise, 2018.
[30] Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh, Towards robust neural networks via random self-ensemble, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 369-385.
[31] Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet, Foundations of machine learning, Adaptive Computation and Machine Learning, xv+486 pp. (2018), MIT Press, Cambridge, MA · Zbl 1407.68007
[32] Nesterov, Yurii, Introductory lectures on convex optimization, Applied Optimization 87, xviii+236 pp. (2004), Kluwer Academic Publishers, Boston, MA · Zbl 1086.90045
[33] Nocedal, Jorge; Wright, Stephen J., Numerical optimization, Springer Series in Operations Research and Financial Engineering, xxii+664 pp. (2006), Springer, New York · Zbl 1104.65059
[34] Adam M. Oberman and Mariana Prazeres, Stochastic gradient descent with polyak’s learning rate, 2019.
[35] Rudin, Leonid I.; Osher, Stanley; Fatemi, Emad, Nonlinear total variation based noise removal algorithms, Phys. D, 60, 1-4, 259-268 (1992) · Zbl 0780.49028
[36] Sapiro, Guillermo, Geometric partial differential equations and image analysis, xxvi+385 pp. (2006), Cambridge University Press, Cambridge · Zbl 0968.35001
[37] Shamir, Ohad; Shalev-Shwartz, Shai, Matrix completion with the trace norm: learning, bounding, and transducing, J. Mach. Learn. Res., 15, 3401-3423 (2014) · Zbl 1318.68152
[38] Alex J Smola and Bernhard Scholkopf, From regularization operators to support vector kernels, Advances in Neural information processing systems, 1998, pp. 343-349.
[39] Srivastava, Nitish; Hinton, Geoffrey; Krizhevsky, Alex; Sutskever, Ilya; Salakhutdinov, Ruslan, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15, 1929-1958 (2014) · Zbl 1318.68153
[40] Su, Weijie; Boyd, Stephen; Cand\`es, Emmanuel J., A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights, J. Mach. Learn. Res., 17, Paper No. 153, 43 pp. (2016) · Zbl 1391.90667
[41] Tikhonov, Andrey N.; Arsenin, Vasiliy Y., Solutions of ill-posed problems, xiii+258 pp. (1977), V. H. Winston & Sons, Washington, D.C.: John Wiley & Sons, New York-Toronto, Ont.-London · Zbl 0354.65028
[42] Wahba, Grace, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics 59, xii+169 pp. (1990), Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA · Zbl 0813.62001
[43] Xu, Huan; Mannor, Shie, Robustness and generalization, Mach. Learn., 86, 3, 391-423 (2012) · Zbl 1242.68259
[44] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals, Understanding deep learning requires rethinking generalization, CoRR abs/1611.03530 (2016).
[45] Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz, mixup: Beyond empirical risk minimization, CoRR abs/1710.09412 (2017).
[46] Zhang, Tong, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist., 32, 1, 56-85 (2004) · Zbl 1105.62323
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.