zbMATH — the first resource for mathematics

Mean field analysis of neural networks: a law of large numbers. (English) Zbl 1440.60008

60-08 Computational methods for problems pertaining to probability theory
60F15 Strong limit theorems
62M45 Neural nets and related approaches to inference from stochastic processes
DeepFace; GNMT
Full Text: DOI
[1] B. Alipanahi, A. Delong, M. Weirauch, and B. Frey, Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning, Nature Biotechnology, 33 (2015), pp. 831-838.
[2] L. Ambrosio, N. Gigli, and G. Savaré, Gradient Flows: In Metric Spaces and in the Space of Probability Measures, Springer, New York, 2008. · Zbl 1145.35001
[3] S. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, A. Ng, J. Raiman, and S. Sengputa, Deep Voice: Real-Time Neural Text-to-Speech, https://arxiv.org/abs/1702.07825, 2017.
[4] A. Barron, Approximation and estimation bounds for artificial neural networks, Mach. Learn., 14 (1994), pp. 115-133. · Zbl 0818.68127
[5] P. Bartlett, D. Foster, and M. Telgarsky, Spectrally-normalized margin bounds for neural networks, Adv. Neural Inf. Process. Syst., 30 (2017), pp. 6241-6250.
[6] L. Bo and A. Capponi, Systemic risk in interbanking networks, SIAM J. Financial Math., 6 (2015), pp. 386-424. · Zbl 1315.91065
[7] F. Bolley, Separability and completeness for the Wasserstein distance, in Séminaire de Probabilités XLI, C. Donati-Martin, M. Émery, A. Rouault, and C. Stricker, eds., Lecture Notes in Math. 1934, Springer, Berlin, 2008. · Zbl 1154.60004
[8] M. Bojarski, D. Del Test, D. Dworakowski, B. Firnier, B. Flepp, P. Goyal, L. Jackel, M. Monfort, U. Muller, J. Zhang, and X. Zhang, End to End Learning for Self-Driving Cars, https://arxiv.org/abs/1604.07316, 2016.
[9] Z. Cao, W. Li, S. Li, and F. Wei, Improving multi-document summarization via text classification, in Proceedings of the AAAI Conference on Artificial Intelligence, 2017, pp. 3053-3059.
[10] J. A. Carrillo, R. J. McCann, and C. Villani, Kinetic equilibration rates for granular media and related equations: Entropy dissipation and mass transportation estimates, Rev. Mat. Iberoam., 19 (2003), pp. 971-1018. · Zbl 1073.35127
[11] L. Chizat and F. Bach, On the global convergence of gradient descent for over-parameterized models using optimal transport, Adv. Neural Inf. Process. Syst., 32 (2018), pp. 3040-3050.
[12] P. Dai Pra, W. Runggaldier, E. Sartori, and M. Tolotti, Large portfolio losses: A dynamic contagion model, Ann. Appl. Probab., 19 (2009), pp. 347-394. · Zbl 1159.60353
[13] P. Dai Pra and F. Hollander, McKean-Vlasov limit for interacting random processes in random media, J. Stat. Phys., 84 (1996), pp. 735-772. · Zbl 1081.60554
[14] P. Dai Pra and M. Tolotti, Heterogeneous credit portfolios and the dynamics of the aggregate losses, Stochastic Process. Appl., 119 (2009), pp. 2913-2944. · Zbl 1187.91203
[15] F. Delarue, J. Inglis, S. Rubenthaler, and E. Tanre, Particle systems with a singular mean-field self-excitation. Application to neuronal networks, Stochastic Process. Appl., 125 (2015), pp. 2451-2492. · Zbl 1328.60134
[16] S. Ethier and T. Kurtz, Markov Processes: Characterization and Convergence, 1986, Wiley, New York, 1986. · Zbl 0592.60049
[17] A. Esteva, B. Kuprel, R. Novoa, J. Ko, S. Swetter, H. Blau, and S. Thrun, Dermatologist-level classification of skin cancer with deep neural networks, Nature, 542 (2017), pp. 115-118.
[18] K. Giesecke, K. Spiliopoulos, and R. Sowers, Default clustering in large portfolios: Typical events, Ann. Appl. Probab., 23 (2013), pp. 348-385. · Zbl 1262.91141
[19] K. Giesecke, K. Spiliopoulos, R. Sowers, and J. Sirignano, Large portfolio asymptotics for loss from default, Math. Finance, 25 (2015), pp. 77-114. · Zbl 1314.91228
[20] A. D. Gottlieb, Markov Transitions and the Propagation of Chaos, Ph.D. thesis, University of California, Berkeley, 1998.
[21] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, Cambridge, MA, 2016. · Zbl 1373.68009
[22] S. Gu, E. Holly, T. Lillicrap, and S. Levine, Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates, in Proceedings of the IEEE Conference on Robotics and Automation, 2017, pp. 3389-3396.
[23] B. Hambly and S. Ledger, A stochastic McKean-Vlasov equation for absorbing diffusions on the half-line, Ann. Appl. Probab., 27 (2017), pp. 2698-2752. · Zbl 1379.60068
[24] K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2 (1989), pp. 359-366. · Zbl 1383.92015
[25] K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks, 4 (1991), pp. 251-257.
[26] K. Hu, Z. Ren, D. Šiška, and L. Szpruch, Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks, https://arxiv.org/abs/1905.07769, 2019.
[27] J. Inglis and D. Talay, Mean-field limit of a stochastic particle system smoothly interacting through threshold hitting-times and applications to neural networks with dendritic component, SIAM J. Math. Anal., 47 (2015), pp. 3884-3916. · Zbl 1325.60158
[28] R. Jordan, D. Kinderlehrer, and F. Otto, The variational formulation of the Fokker-Planck equation, SIAM J. Math. Anal., 29 (1998), pp. 1-17. · Zbl 0915.35120
[29] V. N. Kolokoltsov, Nonlinear Markov Processes and Kinetic Equations, Cambridge Tracts in Math. 182, Cambridge University Press, Cambridge, UK, 2010. · Zbl 1222.60003
[30] C. Kuan and K. Hornik, Convergence of learning algorithms with constant learning rates, IEEE Trans. Neural Networks, 2 (1991), pp. 484-489.
[31] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp, 436-444.
[32] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), pp. 2278-2324.
[33] Y. Leviathan and Y. Matias, Google Duplex: An AI System for Accomplishing Real-World Tasks over the Phone, Google AI Blog, ai.goggleblog.com.2018/05/duplex-ai-system-for-natural-conversation.html, 2018.
[34] J. Ling, A. Kurzawski, and J. Templeton, Reynolds averaged turbulence modelling using deep neural networks with embedded invariance, J. Fluid Mech., 807 (2016), pp. 155-166. · Zbl 1383.76175
[35] J. Ling, R. Jones, and J. Templeton, Machine learning strategies for systems with invariance properties, J. Comput. Phys., 318 (2016), pp. 22-35. · Zbl 1349.76124
[36] S. Mallat, Understanding deep convolutional neural networks, Philos. Trans. A, 374 (2016), 20150203.
[37] O. Moynot and M. Samuelides, Large deviations and mean-field theory for asymmetric random recurrent neural networks, Probab. Theory Related Fields, 123 (2002), pp. 41-75. · Zbl 1004.60023
[38] S. Mei, A. Montanari, and P. Nguyen, A mean field view of the landscape of two-layer neural networks, Proc. Natl. Acad. Sci. USA, 115 (2018), pp. 7665-7671.
[39] R. Nallapati, B. Zhou, C. Gulcehre, and B. Xiang, Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond, https://arxiv.org/abs/1602.06023, 2016.
[40] H. Pierson and M. Gashler, Deep learning in robotics: A review of recent research, Adv. Robotics, 31 (2017), pp. 821-835.
[41] G. M. Rotskoff and E. Vanden-Eijnden, Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error, https://arxiv.org/abs/1805.00915, 2018.
[42] D. Silver et al., Mastering the game of Go with deep networks and tree search, Nature, 529 (2016), pp. 484-489.
[43] J. Sirignano, A. Sadhwani, and K. Giesecke, Deep Learning for Mortgage Risk, https://arxiv.org/abs/1607.02470, 2016.
[44] J. Sirignano and R. Cont, Universal Features of Price Formation in Financial Markets: Perspectives from Deep Learning, https://arxiv.org/abs/1803.06917, 2018. · Zbl 1420.91433
[45] J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, J. Comput. Phys., 375 (2018), pp. 1339-1364. · Zbl 1416.65394
[46] J. Sirignano and K. Spiliopoulos, Mean field analysis of neural networks: A central limit theorem, Stochastic Process. Appl., to appear. · Zbl 1441.60022
[47] J. Sirignano and K. Spiliopoulos, Mean Field Analysis of Deep Neural Networks, https://arxiv.org/abs/1903.04440, 2019. · Zbl 1441.60022
[48] H. Sompolinsky, A. Crisanti, and H. Sommers, Chaos in random neural networks, Phys. Rev. Lett., 61 (1988), 259.
[49] A.-S. Sznitman, Topics in propagation of chaos, in Ecole d’Eté de Probabilitiés de Saint-Flour XIX 1989, P.-L. Hennequin, ed., Lecture Notes in Math. 1464, Springer, Berlin, 1991, pp. 165-251.
[50] I. Sutskever, O. Vinyals, and Q. Le, Sequence to sequence learning with neural networks, Adv. Neural Inf. Process. Syst., 27 (2014), pp. 3104-3112.
[51] N. Sunderhauf, O. Brock, W. Cheirer, R. Hadsell, D. Fox, J. Leitner, B. Upcroft, P. Abbeel, W. Burgard, M. Milford, and P. Corke, The limits and potentials of deep learning for robotics, Internat. J. Robotics Res., 37 (2018), pp. 405-420.
[52] M. Telgarsky, Benefits of Depth in Neural Networks, https://arxiv.org/abs/1602.04485, 2016.
[53] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, Deepface: Closing the gap to human-level performance in face verification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1701-1708.
[54] C. Wang, J. Mattingly, and Y. Lu, Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA, https://arxiv.org/abs/1712.04332, 2017.
[55] Y. Wu, M. Schuster, Z. Chen, Q. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, and J. Klingner, Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation, https://arxiv.org/abs/1609.08144, 2016.
[56] Y. Zhang, W. Chan, and N. Jaitly, Very deep convolutional networks for end-to-end speech recognition, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017, pp. 4845-4849.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.