Error bounds for approximations with deep ReLU networks. (English) Zbl 1429.68260

Summary: We study expressive power of shallow and deep neural networks with piece-wise linear activation functions. We establish new rigorous upper and lower bounds for the network complexity in the setting of approximations in Sobolev spaces. In particular, we prove that deep ReLU networks more efficiently approximate smooth functions than shallow networks. In the case of approximations of 1D Lipschitz functions we describe adaptive depth-6 network architectures more efficient than the standard shallow architecture.


68T07 Artificial neural networks and deep learning
41A25 Rate of convergence, degree of approximation
Full Text: DOI arXiv


[1] Anthony, Martin; Bartlett, Peter L., Neural network learning: Theoretical foundations (2009), Cambridge University Press · Zbl 1170.68552
[2] Bartlett, Peter L.; Maiorov, Vitaly; Meir, Ron, Almost linear VC-dimension bounds for piecewise polynomial networks, Neural Computation, 10, 8, 2159-2173 (1998)
[3] Bianchini, Monica; Scarselli, Franco, On the complexity of neural network classifiers: a comparison between shallow and deep architectures, IEEE Transactions on Neural Networks and Learning Systems, 25, 8, 1553-1565 (2014)
[5] Dawson, C. M.; Nielsen, M. A., The Solovay-Kitaev algorithm, Quantum Information & Computation, 6, 1, 81-95 (2006) · Zbl 1152.81706
[6] Delalleau, Olivier; Bengio, Yoshua, Shallow vs. deep sum-product networks, Advances in Neural Information Processing Systems, 666-674 (2011) · Zbl 1348.68183
[7] DeVore, Ronald A.; Howard, Ralph; Micchelli, Charles, Optimal nonlinear approximation, Manuscripta Mathematica, 63, 4, 469-478 (1989) · Zbl 0682.41033
[8] Goldberg, Paul W.; Jerrum, Mark R., Bounding the Vapnik-Chervonenkis dimension of concept classes parameterized by real numbers, Machine Learning, 18, 2-3, 131-148 (1995) · Zbl 0831.68087
[9] Kainen, Paul C.; Kůrková, Věra; Vogt, Andrew, Approximation by neural networks is not continuous, Neurocomputing, 29, 1, 47-56 (1999)
[10] Kearns, Michael J.; Schapire, Robert E., Efficient distribution-free learning of probabilistic concepts, (Foundations of computer science, 1990. Proceedings., 31st annual symposium on (1990), IEEE), 382-391
[11] Kitaev, A. Yu.; Shen, A. H.; Vyalyi, M. N., Classical and quantum computation (2002), American Mathematical Society: American Mathematical Society Boston, MA, USA · Zbl 1022.68001
[12] LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey, Deep learning, Nature, 521, 7553, 436-444 (2015)
[14] Maiorov, V. E.; Meir, Ron, On the near optimality of the stochastic approximation of smooth functions by neural networks, Advances in Computational Mathematics, 13, 1, 79-103 (2000) · Zbl 0939.41013
[15] Mhaskar, H. N., Neural networks for optimal approximation of smooth and analytic functions, Neural Computation, 8, 1, 164-177 (1996)
[16] Mhaskar, Hrushikesh Narhar, Approximation properties of a multilayered feedforward artificial neural network, Advances in Computational Mathematics, 1, 1, 61-80 (1993) · Zbl 0824.41011
[19] Montufar, Guido F.; Pascanu, Razvan; Cho, Kyunghyun; Bengio, Yoshua, On the number of linear regions of deep neural networks, (Advances in Neural Information Processing Systems (2014)), 2924-2932
[20] Pinkus, Allan, Approximation theory of the MLP model in neural networks, Acta Numerica, 8, 143-195 (1999) · Zbl 0959.68109
[22] Sakurai, Akito, Tight Bounds for the VC-Dimension of Piecewise Polynomial Networks, (Advances in Neural Information Processing Systems (1999)), 323-329
[23] Schmidhuber, Jürgen, Deep learning in neural networks: an overview, Neural Networks, 61, 85-117 (2015)
[24] Shannon, Claude, The synthesis of two-terminal switching circuits, Bell Labs Technical Journal, 28, 1, 59-98 (1949)
[27] Vapnik, Vladimir N.; Chervonenkis, A. Ya, On the uniform convergence of relative frequencies of events to their probabilities, (Measures of complexity (2015), Springer), 11-30 · Zbl 1337.60009
[28] Warren, Hugh E., Lower bounds for approximation by nonlinear manifolds, Transactions of the American Mathematical Society, 133, 1, 167-178 (1968) · Zbl 0174.35403
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.