×

Approximation and learning by greedy algorithms. (English) Zbl 1138.62019

Summary: We consider the problem of approximating a given element \(f\) from a Hilbert space \(\mathcal H\) by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. For all these algorithms, we prove convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary. We then show how these bounds for convergence rates lead to a new theory for the performance of greedy algorithms in learning.
In particular, we build upon the results of W. S. Lee et al. [IEEE Trans. Inf. Theory 42, No. 6, 2118–2132 (1996; Zbl 0874.68253)] to construct learning algorithms based on greedy approximations which are universally consistent and provide provable convergence rates for large classes of functions. The use of greedy algorithms in the context of learning is very appealing since it greatly reduces the computational burden when compared with standard model selection using general dictionaries.

MSC:

62G08 Nonparametric regression and quantile regression
68T05 Learning and adaptive systems in artificial intelligence
41A46 Approximation by arbitrary nonlinear expressions; widths and entropy
41A63 Multidimensional problems
46N30 Applications of functional analysis in probability theory and statistics
65C60 Computational problems in statistics (MSC2010)

Citations:

Zbl 0874.68253

References:

[1] Avellaneda, M., Davis, G. and Mallat, S. (1997). Adaptive greedy approximations. Constr. Approx. 13 57-98. · Zbl 0885.41006
[2] Barron, A. R. (1990). Complexity regularization with application to artificial neural network. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 561-576. Kluwer Academic Publishers, Dordrecht. · Zbl 0739.62001
[3] Barron, A. R. (1992). Neural net approximation. Proc. 7th Yale Workshop on Adaptive and Learning Systems (K. S. Narendra, ed.) 1 69-72. New Haven, CT.
[4] Barron, A. R. (1993). Universal approximation bounds for superposition of n sigmoidal functions. IEEE Trans. Inform. Theory 39 930-945. · Zbl 0818.68126 · doi:10.1109/18.256500
[5] Barron, A. and Cheang, G. H. L. (2001). Penalized least squares, model selection, convex hull classes, and neural nets. In Proceedings of the 9th ESANN , Brugge , Belgium (M. Verleysen, ed.) 371-376. De-Facto Press.
[6] Bennett, C. and Sharpley, R. (1988). Interpolation of Operators . Academic Press, Boston. · Zbl 0647.46057
[7] Bergh, J. and Löfström, J. (1976). Interpolation Spaces . Springer, Berlin. · Zbl 0344.46071
[8] DeVore, R. (1998). Nonlinear approximation. In Acta Numerica 7 51-150. Cambridge Univ. Press. · Zbl 0931.65007
[9] DeVore, R. and Temlyakov, V. (1996). Some remarks on greedy algorithms. Adv. Comput. Math. 5 173-187. · Zbl 0857.65016 · doi:10.1007/BF02124742
[10] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[11] Györfy, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression . Springer, Berlin.
[12] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning . Springer, New York. · Zbl 0973.62007
[13] Huang, C., Cheng, G. L. H. and Barron, A. R. Risk of penalized least squares, greedy term selection, and L 1 -penalized estimators from flexible function libraries. Yale Department of Statistics Report.
[14] Jones, L. K. (1992). A simple lemma on greedy approximation in Hilbert spaces and convergence rates for projection pursuit regression and neural network training. Ann. Statist. 20 608-613. · Zbl 0746.62060 · doi:10.1214/aos/1176348546
[15] Konyagin, S. V. and Temlyakov, V. N. (1999). Rate of convergence of pure greedy algorithm. East J. Approx. 5 493-499. · Zbl 1101.41309
[16] Kurkova, V. and Sanguineti, M. (2001). Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inform. Theory 47 2659-2665. · Zbl 1008.41012 · doi:10.1109/18.945285
[17] Kurkova, V. and Sanguineti, M. (2002). Comparison of worst case errors in linear and neural network approximation. IEEE Trans. Inform. Theory 48 264-275. · Zbl 1059.62589 · doi:10.1109/18.971754
[18] Lee, W. S., Bartlett, P. and Williamson, R. C. (1996). Efficient agnostic learning of neural networks with bounded fan-in. IEEE Trans. Inform. Theory 42 2118-2132. · Zbl 0874.68253 · doi:10.1109/18.556601
[19] Livshitz, E. D. and Temlyakov, V. N. (2003). Two lower estimates in greedy approximation. Constr. Approx. 19 509-524. · Zbl 1044.41010 · doi:10.1007/s00365-003-0533-6
[20] Petrushev, P. P. (1998). Approximation by ridge functions and neural networks. SIAM J. Math. Anal. 30 155-189. · Zbl 0927.41006 · doi:10.1137/S0036141097322959
[21] Temlyakov, V. (2003). Nonlinear methods of approximation. J. FOCM 3 33-107. · Zbl 1039.41012 · doi:10.1007/s102080010029
[22] Temlyakov, V. (2005). Greedy algorithms with restricted depth search. Proc. of the Steklov Inst. Math. 248 255-267. · Zbl 1148.41034
[23] Tibshirani, R. (1995). Regression shrinkage and selection via the LASSO. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.