##
**Error bounds for approximations with deep ReLU networks.**
*(English)*
Zbl 1429.68260

Summary: We study expressive power of shallow and deep neural networks with piece-wise linear activation functions. We establish new rigorous upper and lower bounds for the network complexity in the setting of approximations in Sobolev spaces. In particular, we prove that deep ReLU networks more efficiently approximate smooth functions than shallow networks. In the case of approximations of 1D Lipschitz functions we describe adaptive depth-6 network architectures more efficient than the standard shallow architecture.

### MSC:

68T07 | Artificial neural networks and deep learning |

41A25 | Rate of convergence, degree of approximation |

### References:

[1] | Anthony, Martin; Bartlett, Peter L., Neural network learning: Theoretical foundations (2009), Cambridge University Press |

[2] | Bartlett, Peter L.; Maiorov, Vitaly; Meir, Ron, Almost linear VC-dimension bounds for piecewise polynomial networks, Neural Computation, 10, 8, 2159-2173 (1998) |

[3] | Bianchini, Monica; Scarselli, Franco, On the complexity of neural network classifiers: a comparison between shallow and deep architectures, IEEE Transactions on Neural Networks and Learning Systems, 25, 8, 1553-1565 (2014) |

[5] | Dawson, C. M.; Nielsen, M. A., The Solovay-Kitaev algorithm, Quantum Information & Computation, 6, 1, 81-95 (2006) |

[6] | Delalleau, Olivier; Bengio, Yoshua, Shallow vs. deep sum-product networks, Advances in Neural Information Processing Systems, 666-674 (2011) |

[7] | DeVore, Ronald A.; Howard, Ralph; Micchelli, Charles, Optimal nonlinear approximation, Manuscripta Mathematica, 63, 4, 469-478 (1989) |

[8] | Goldberg, Paul W.; Jerrum, Mark R., Bounding the Vapnik-Chervonenkis dimension of concept classes parameterized by real numbers, Machine Learning, 18, 2-3, 131-148 (1995) |

[9] | Kainen, Paul C.; Kůrková, Věra; Vogt, Andrew, Approximation by neural networks is not continuous, Neurocomputing, 29, 1, 47-56 (1999) |

[10] | Kearns, Michael J.; Schapire, Robert E., Efficient distribution-free learning of probabilistic concepts, (Foundations of computer science, 1990. Proceedings., 31st annual symposium on (1990), IEEE), 382-391 |

[11] | Kitaev, A. Yu.; Shen, A. H.; Vyalyi, M. N., Classical and quantum computation (2002), American Mathematical Society: American Mathematical Society Boston, MA, USA |

[12] | LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey, Deep learning, Nature, 521, 7553, 436-444 (2015) |

[14] | Maiorov, V. E.; Meir, Ron, On the near optimality of the stochastic approximation of smooth functions by neural networks, Advances in Computational Mathematics, 13, 1, 79-103 (2000) |

[15] | Mhaskar, H. N., Neural networks for optimal approximation of smooth and analytic functions, Neural Computation, 8, 1, 164-177 (1996) |

[16] | Mhaskar, Hrushikesh Narhar, Approximation properties of a multilayered feedforward artificial neural network, Advances in Computational Mathematics, 1, 1, 61-80 (1993) |

[19] | Montufar, Guido F.; Pascanu, Razvan; Cho, Kyunghyun; Bengio, Yoshua, On the number of linear regions of deep neural networks, (Advances in Neural Information Processing Systems (2014)), 2924-2932 |

[20] | Pinkus, Allan, Approximation theory of the MLP model in neural networks, Acta Numerica, 8, 143-195 (1999) |

[22] | Sakurai, Akito, Tight Bounds for the VC-Dimension of Piecewise Polynomial Networks, (Advances in Neural Information Processing Systems (1999)), 323-329 |

[23] | Schmidhuber, Jürgen, Deep learning in neural networks: an overview, Neural Networks, 61, 85-117 (2015) |

[24] | Shannon, Claude, The synthesis of two-terminal switching circuits, Bell Labs Technical Journal, 28, 1, 59-98 (1949) |

[27] | Vapnik, Vladimir N.; Chervonenkis, A. Ya, On the uniform convergence of relative frequencies of events to their probabilities, (Measures of complexity (2015), Springer), 11-30 |

[28] | Warren, Hugh E., Lower bounds for approximation by nonlinear manifolds, Transactions of the American Mathematical Society, 133, 1, 167-178 (1968) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.