×

Deep neural networks motivated by partial differential equations. (English) Zbl 1434.68522

Summary: Partial differential equations (PDEs) are indispensable for modeling many physical phenomena and also commonly used for solving image processing tasks. In the latter area, PDE-based approaches interpret image data as discretizations of multivariate functions and the output of image processing algorithms as solutions to certain PDEs. Posing image processing problems in the infinite-dimensional setting provides powerful tools for their analysis and solution. For the last few decades, the reinterpretation of classical image processing problems through the PDE lens has been creating multiple celebrated approaches that benefit a vast area of tasks including image segmentation, denoising, registration, and reconstruction. In this paper, we establish a new PDE interpretation of a class of deep convolutional neural networks (CNN) that are commonly used to learn from speech, image, and video data. Our interpretation includes convolution residual neural networks (ResNet), which are among the most promising approaches for tasks such as image classification having improved the state-of-the-art performance in prestigious benchmark challenges. Despite their recent successes, deep ResNets still face some critical challenges associated with their design, immense computational costs and memory requirements, and lack of understanding of their reasoning. Guided by well-established PDE theory, we derive three new ResNet architectures that fall into two new classes: parabolic and hyperbolic CNNs. We demonstrate how PDE theory can provide new insights and algorithms for deep learning and demonstrate the competitiveness of three new CNN architectures using numerical experiments.

MSC:

68T07 Artificial neural networks and deep learning
35Q68 PDEs in connection with computer science
68U10 Computing methodologies for image processing
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Ambrosio, L.; Tortorelli, VM, Approximation of functionals depending on jumps by elliptic functionals via gamma-convergence, Commun. Pure Appl. Math., 43, 8, 999-1036 (1990) · Zbl 0722.49020 · doi:10.1002/cpa.3160430805
[2] Ascher, U., Numerical Methods for Evolutionary Differential Equations (2010), Philadelphia: SIAM, Philadelphia
[3] Ascher, U.; Mattheij, R.; Russell, R., Numerical Solution of Boundary Value Problems for Ordinary Differential Equations (1995), Philadelphia: SIAM, Philadelphia · Zbl 0843.65054
[4] Bengio, Y., Learning deep architectures for AI, Found. Trends Mach. Learn., 2, 1, 1-127 (2009) · Zbl 1192.68503 · doi:10.1561/2200000006
[5] Bengio, Y.; Simard, P.; Frasconi, P., Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Netw., 5, 2, 157-166 (1994) · doi:10.1109/72.279181
[6] Biegler, LT; Ghattas, O.; Heinkenschloss, M.; Keyes, D.; van Bloemen Waanders, B., Real-Time PDE-Constrained Optimization (2007), Philadelphia: Society for Industrial and Applied Mathematics (SIAM), Philadelphia · Zbl 1117.49004
[7] Borzì, A.; Schulz, V., Computational Optimization of Systems Governed by Partial Differential Equations (2012), Philadelphia: Society for Industrial and Applied Mathematics (SIAM), Philadelphia · Zbl 1240.90001
[8] Chan, TF; Vese, LA, Active contours without edges, IEEE Trans. Image Process., 10, 2, 266-277 (2001) · Zbl 1039.68779 · doi:10.1109/83.902291
[9] Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: AAAI Conference on AI (2018)
[10] Chaudhari, P., Oberman, A., Osher, S., Soatto, S., Carlier, G.: Deep relaxation: partial differential equations for optimizing deep neural networks. arXiv preprint arXiv:1704.04932, (2017) · Zbl 1427.82032
[11] Chen, T. Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.: Neural ordinary differential equations. arXiv preprint arXiv:1806.07366 (2018)
[12] Chen, Y.; Pock, T., Trainable nonlinear reaction diffusion: a flexible framework for fast and effective image restoration, IEEE Trans. Pattern Anal. Mach. Intell., 39, 6, 1256-1272 (2017) · doi:10.1109/TPAMI.2016.2596743
[13] Cisse, M., Bojanowski, P., Grave, E., Dauphin, Y., Usunier, N.: Parseval networks: improving robustness to adversarial examples. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 854-863. JMLR. org, (2017)
[14] Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pp. 215-223 (2011)
[15] Combettes, P. L., Pesquet, J.-C.: Deep neural network structures solving variational inequalities. arXiv preprint arXiv:1808.07526 (2018) · Zbl 1448.49014
[16] Combettes, P. L., Pesquet, J.-C.: Lipschitz certificates for neural network structures driven by averaged activation operators. arXiv preprint arXiv:1903.01014v2 (2019)
[17] Dundar, A., Jin, J., Culurciello, E.: Convolutional clustering for unsupervised learning. In: ICLR (2015)
[18] Gomez, A. N., Ren, M., Urtasun, R., Grosse, R. B.: The reversible residual network: backpropagation without storing activations. In: Advances in Neural Information Processing Systems, pp. 2211-2221 (2017)
[19] Goodfellow, I.; Bengio, Y.; Courville, A., Deep Learning (2016), Cambridge: MIT Press, Cambridge · Zbl 1373.68009
[20] Goodfellow, I. J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
[21] Haber, E.; Ruthotto, L., Stable architectures for deep neural networks, Inverse Probl., 34, 014004 (2017) · Zbl 1426.68236 · doi:10.1088/1361-6420/aa9a90
[22] Haber, E., Ruthotto, L., Holtham, E.: Learning across scales—a multiscale method for convolution neural networks. In: AAAI Conference on AI, pp. 1-8, arXiv:1703.02009 (2017)
[23] Hansen, PC; Nagy, JG; O’Leary, DP, Deblurring Images: Matrices, Spectra and Filtering (2006), Philadelphia: SIAM, Philadelphia · Zbl 1112.68127
[24] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778 (2016)
[25] He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: 36th International Conference on Machine Learning, pp. 630-645 (2016)
[26] Hernández-Lobato, JM; Gelbart, MA; Adams, RP; Hoffman, MW; Ghahramani, Z., A general framework for constrained bayesian optimization using information-based search, J. Mach. Learn. Res., 17, 2-51 (2016) · Zbl 1391.90641
[27] Herzog, R.; Kunisch, K., Algorithms for PDE-constrained optimization, GAMM-Mitteilungen, 33, 2, 163-176 (2010) · Zbl 1207.49034 · doi:10.1002/gamm.201010013
[28] Hinton, G.; Deng, L.; Yu, D.; Dahl, GE; Mohamed, A-R; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, TN, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process. Mag., 29, 6, 82-97 (2012) · doi:10.1109/MSP.2012.2205597
[29] Horn, BK; Schunck, BG, Determining optical flow, Artif. Intell., 17, 1-3, 185-203 (1981) · Zbl 1497.68488 · doi:10.1016/0004-3702(81)90024-2
[30] Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd International Conference on Machine Learning, pp. 448-456 (2015)
[31] Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
[32] Krizhevsky, A.; Sutskever, I.; Hinton, G., Imagenet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst., 61, 1097-1105 (2012)
[33] LeCun, Y.; Bengio, Y., Convolutional networks for images, speech, and time series, Handb. Brain Theory Neural Netw., 3361, 255-258 (1995)
[34] LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning, Nature, 521, 7553, 436-444 (2015) · doi:10.1038/nature14539
[35] LeCun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional networks and applications in vision. In: IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems, pp. 253-256 (2010)
[36] Li, Q.; Chen, L.; Tai, C.; Weinan, E., Maximum principle based algorithms for deep learning, J. Mach. Learn. Res., 18, 1, 5998-6026 (2017) · Zbl 1467.68156
[37] Modersitzki, J., FAIR: Flexible Algorithms for Image Registration (2009), Philadelphia: SIAM, Philadelphia · Zbl 1183.68695
[38] Moosavi-Dezfooli, S. M., Fawzi, A., F, O.: arXiv, and 2017. Universal adversarial perturbations. openaccess.thecvf.com
[39] Mumford, D.; Shah, J., Optimal approximations by piecewise smooth functions and associated variational-problems, Commun. Pure Appl. Math., 42, 5, 577-685 (1989) · Zbl 0691.49036 · doi:10.1002/cpa.3160420503
[40] Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: automated whitebox testing of deep learning systems. In: 26th Symposium on Oper. Sys. Princ., pp. 1-18. ACM Press, New York (2017)
[41] Perona, P.; Malik, J., Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Mach. Intell., 12, 7, 629-639 (1990) · doi:10.1109/34.56205
[42] Raina, R., Madhavan, A., Ng, A. Y.: Large-scale deep unsupervised learning using graphics processors. In: 26th Annual International Conference, pp. 873-880. ACM, New York (2009)
[43] Rogers, C.; Moodie, T., Wave Phenomena: Modern Theory and Applications (1984), North-Holland: Elsevier Science, North-Holland · Zbl 0542.00018
[44] Rosenblatt, F., The perceptron: a probabilistic model for information storage and organization in the brain, Psychol. Rev., 65, 6, 386-408 (1958) · doi:10.1037/h0042519
[45] Rudin, LI; Osher, S.; Fatemi, E., Nonlinear total variation based noise removal algorithms, Phys. D, 60, 1-4, 259-268 (1992) · Zbl 0780.49028 · doi:10.1016/0167-2789(92)90242-F
[46] Scherzer, O.; Grasmair, M.; Grossauer, H.; Haltmeier, M.; Lenzen, F., Variational Methods in Imaging (2009), New York: Springer, New York · Zbl 1177.68245
[47] Torralba, A.; Fergus, R.; Freeman, WT, 80 million tiny images: a large data set for nonparametric object and scene recognition, IEEE Trans. Pattern Anal. Mach. Intell., 30, 11, 1958-1970 (2008) · doi:10.1109/TPAMI.2008.128
[48] Weickert, J.: Anisotropic diffusion in image processing. Stuttgart (2009) · Zbl 0888.68120
[49] Weinan, E., A proposal on machine learning via dynamical systems, Commun. Math. Stat., 5, 1, 1-11 (2017) · Zbl 1380.37154
[50] Yang, S., Luo, P., Loy, C. C., Shum, W. K., Tang, X.: Deep visual representation learning with target coding. In: AAAI Conference on AI, pp. 3848-3854 (2015)
[51] Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.