On deep learning as a remedy for the curse of dimensionality in nonparametric regression. (English) Zbl 1421.62036

This paper generalized prior results about the rate of convergence of suitable multilayer neural network regression estimates when the regression function satisfies a \((p,C)\)-smooth generalized hierarchical interaction model of given order \(d^{*}\) and given level \(l\). The results presented are stronger and more general than the results previously known in the literature because they depend on fewer assumptions in the sense that the convergence rate is obtained with much less rigid assumptions on the functional class the regression function belongs to.


62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI Euclid


[1] Anthony, M. and Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations. Cambridge Univ. Press, Cambridge. · Zbl 0968.68126
[2] Bagirov, A. M., Clausen, C. and Kohler, M. (2009). Estimation of a regression function by maxima of minima of linear functions. IEEE Trans. Inform. Theory55 833-845. · Zbl 1367.62085 · doi:10.1109/TIT.2008.2009835
[3] Barron, A. R. (1991). Complexity regularization with application to artificial neural networks. In Nonparametric Functional Estimation and Related Topics (Spetses, 1990). NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci.335 561-576. Kluwer Academic, Dordrecht. · Zbl 0739.62001
[4] Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory39 930-945. · Zbl 0818.68126 · doi:10.1109/18.256500
[5] Barron, A. R. (1994). Approximation and estimation bounds for artificial neural networks. Mach. Learn.14 115-133. · Zbl 0818.68127
[6] Bauer, B. and Kohler, M. (2019). Supplement to “On deep learning as a remedy for the curse of dimensionality in nonparametric regression.” DOI:10.1214/18-AOS1747SUPPA, DOI:10.1214/18-AOS1747SUPPB. · Zbl 1421.62036
[7] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York) 31. Springer, New York. · Zbl 0853.68150
[8] Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc.76 817-823.
[9] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York. · Zbl 1021.62024
[10] Härdle, W., Hall, P. and Ichimura, H. (1993). Optimal smoothing in single-index models. Ann. Statist.21 157-178. · Zbl 0770.62049 · doi:10.1214/aos/1176349020
[11] Härdle, W. and Stoker, T. M. (1989). Investigating smooth multiple regression by the method of average derivatives. J. Amer. Statist. Assoc.84 986-995. · Zbl 0703.62052
[12] Haykin, S. O. (2008). Neural Networks and Learning Machines, 3rd ed. Prentice Hall, New York.
[13] Hertz, J., Krogh, A. and Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City, CA.
[14] Horowitz, J. L. and Mammen, E. (2007). Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions. Ann. Statist.35 2589-2619. · Zbl 1129.62034 · doi:10.1214/009053607000000415
[15] Kohler, M. and Krzyżak, A. (2005). Adaptive regression estimation with multilayer feedforward neural networks. J. Nonparametr. Stat.17 891-913. · Zbl 1121.62043 · doi:10.1080/10485250500309608
[16] Kohler, M. and Krzyżak, A. (2017). Nonparametric regression based on hierarchical interaction models. IEEE Trans. Inform. Theory63 1620-1630. · Zbl 1366.62082 · doi:10.1109/TIT.2016.2634401
[17] Kong, E. and Xia, Y. (2007). Variable selection for the single-index model. Biometrika94 217-229. · Zbl 1142.62353 · doi:10.1093/biomet/asm008
[18] Lazzaro, D. and Montefusco, L. B. (2002). Radial basis functions for the multivariate interpolation of large scattered data sets. J. Comput. Appl. Math.140 521-536. · Zbl 1025.65015 · doi:10.1016/S0377-0427(01)00485-X
[19] Lugosi, G. and Zeger, K. (1995). Nonparametric estimation via empirical risk minimization. IEEE Trans. Inform. Theory41 677-687. · Zbl 0818.62041 · doi:10.1109/18.382014
[20] McCaffrey, D. F. and Gallant, A. R. (1994). Convergence rates for single hidden layer feedforward networks. Neural Netw.7 147-158.
[21] Mhaskar, H. N. and Poggio, T. (2016). Deep vs. shallow networks: An approximation theory perspective. Anal. Appl. (Singap.) 14 829-848. · Zbl 1355.68233 · doi:10.1142/S0219530516400042
[22] Mielniczuk, J. and Tyrcha, J. (1993). Consistency of multilayer perceptron regression estimators. Neural Netw.6 1019-1022.
[23] Ripley, B. D. (2008). Pattern Recognition and Neural Networks. Cambridge Univ. Press, Cambridge. Reprint of the 1996 original. · Zbl 1163.62047
[24] Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Netw.61 85-117.
[25] Schmidt-Hieber, J. (2017). Nonparametric regression using deep neural networks with ReLU activation function. Available at arXiv:1708.06633v2. · Zbl 1459.62059
[26] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist.10 1040-1053. · Zbl 0511.62048 · doi:10.1214/aos/1176345969
[27] Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist.13 689-705. · Zbl 0605.62065 · doi:10.1214/aos/1176349548
[28] Stone, C. J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation. Ann. Statist.22 118-184. · Zbl 0827.62038 · doi:10.1214/aos/1176325361
[29] Yu, Y. and Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. J. Amer. Statist. Assoc.97 1042-1054. · Zbl 1045.62035 · doi:10.1198/016214502388618861
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.