×

Recursive aggregation of estimators by the mirror descent algorithm with averaging. (English. Russian original) Zbl 1123.62044

Probl. Inf. Transm. 41, No. 4, 368-384 (2005); translation from Probl. Peredachi Inf. 2005, No. 4, 78-96 (2005).
Summary: We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the \(\ell _{1}\)-constraint. It is defined by a stochastic version of the mirror descent algorithm which performs descent of the gradient type in the dual space with an additional averaging. The main result of the paper is an upper bound for the expected accuracy of the proposed estimator. This bound is of the order \(C\sqrt {(\log M)/t}\) with an explicit and small constant factor \(C\), where \(M\) is the dimension of the problem and \(t\) stands for the sample size. A similar bound is proved for a more general setting, which covers, in particular, the regression model with squared loss.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G20 Asymptotic properties of nonparametric inference
65C60 Computational problems in statistics (MSC2010)
62G08 Nonparametric regression and quantile regression
62G99 Nonparametric inference

References:

[1] Schapire, R.E., The Strength of Weak Learnability, Machine Learning, 1990, vol. 5, no.2, pp. 197–227.
[2] Freund, Y., Boosting a Weak Learning Algorithm by Majority, Inform. Comput., 1995, vol. 121, no.2, pp. 256–285. · Zbl 0833.68109 · doi:10.1006/inco.1995.1136
[3] Schapire, R.E., Freund, Y., Bartlett, P.L., and Lee, W.S., Boosting the Margin: a New Explanation for the Effectiveness of Voting Methods, Ann. Statist., 1998, vol. 26, no.5, pp. 1651–1686. · Zbl 0929.62069 · doi:10.1214/aos/1024691352
[4] Vapnik, V.N., Statistical Learning Theory, New York: Wiley, 1998. · Zbl 0935.62007
[5] Bartlett, P.L, Jordan, M.I., and McAuliffe, J.D., Convexity, Classification, and Risk Bounds, Tech. Report of Dept. Statist., Univ. of California, Berkeley, 2003, no. 638. · Zbl 1118.62330
[6] Lugosi, G. and Vayatis, N., On the Bayes-Risk Consistency of Regularized Boosting Methods (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 30–55. · Zbl 1105.62319
[7] Scovel, J.C. and Steinwart, I., Fast Rates for Support Vector Machines, Los Alamos National Lab. Tech. Report, 2003, no. LA-UR-03-9117.
[8] Zhang, T., Statistical Behavior and Consistency of Classification Methods Based on Convex Risk Minimization (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 56–85. · Zbl 1105.62323 · doi:10.1214/aos/1079120130
[9] Tsypkin, Ya.Z., Osnovy teorii obychayushchikhsya sistem, Moscow: Nauka, 1970. Translated under the title Foundations of the Theory of Learning Systems, New York: Academic, 1973.
[10] Aizerman, M.A., Braverman, E.M., and Rozonoer, L.I., Metod potential’nykh funktsyi v teorii obycheniya mashin (Method of Potential Functions in the Theory of Learning Machines), Moscow: Nauka, 1970.
[11] Aizerman, M., Braverman, E., and Rozonoer, L., Extrapolative Problems in Automatic Control and the Method of Potential Functions, Am. Math. Soc. Transl., 1970, vol. 87, pp. 281–303.
[12] Devroye, L., Gyorfi, L., and Lugosi, G., A Probabilistic Theory of Pattern Recognition, New York: Springer, 1996. · Zbl 0853.68150
[13] Cesa-Bianchi, N., Conconi, A., and Gentile, C., A Second-Order Perceptron Algorithm, SIAM J. Comput., 2005, vol. 34, no.3, pp. 640–668. · Zbl 1075.68036 · doi:10.1137/S0097539703432542
[14] Kivinen, J., Smola, A.J., and Williamson, R.C., Online Learning with Kernels, IEEE Trans. Signal Process., 2004, vol. 52, no.8, pp. 2165–2176. · Zbl 1369.68281 · doi:10.1109/TSP.2004.830991
[15] Zhang, T., Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms, in Proc. 21st Int. Conf. on Machine Learning, Banff, Alberta, Canada, 2004 (ICML’04), New York: ACM, 2004, vol. 69, p. 116.
[16] Polyak, B.T. and Juditsky, A.B., Acceleration of Stochastic Approximation by Averaging, SIAM J. Control Optim., 1992, vol. 30, no.4, pp. 838–855. · Zbl 0762.62022 · doi:10.1137/0330046
[17] Nemirovskii, A.S. and Yudin, D.B., Slozhnost’ zadach i effektivnost’ metodov optimizatsii, Moscow: Nauka, 1979. Translated under the title Problem Complexity and Method Efficiency in Optimization, Chichester: Wiley, 1983.
[18] Ben-Tal, A., Margalit, T., and Nemirovski, A., The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography, SIAM J. Optim., 2001, vol. 12, no.1, pp. 79–108. · Zbl 0992.92034 · doi:10.1137/S1052623499354564
[19] Ben-Tal, A. and Nemirovski, A., The Conjugate Barrier Mirror Descent Method for Non-Smooth Convex Optimization, Preprint of the Faculty of Industr. Eng. Manag., Technion-Israel Inst. Technol., Haifa, 1999. Available at http://iew3.technion.ac.il/Labs/Opt/opt/Pap/CP MD.pdf.
[20] Kivinen, J. and Warmuth, M.K., Additive Versus Exponentiated Gradient Updates for Linear Prediction, Inform. Comput., 1997, vol. 132, no.1, pp. 1–64. · Zbl 0872.68158 · doi:10.1006/inco.1996.2612
[21] Helmbold, D.P., Kivinen, J., and Warmuth, M.K., Relative Loss Bounds for Single Neurons, IEEE Trans. Neural Networks, 1999, vol. 10, no.6, pp. 1291–1304. · doi:10.1109/72.809075
[22] Kivinen, J. and Warmuth, M.K., Relative Loss Bounds for Multidimensional Regression Problems, Machine Learning, 2001, vol. 45, no.3, pp. 301–329. · Zbl 0998.68097 · doi:10.1023/A:1017938623079
[23] Cesa-Bianchi, N. and Gentile, C., Improved Risk Tail Bounds for On-Line Algorithms, Neural Information Processing Systems, NIPS 2004 Workshop on (Ab)Use of Bounds, Whistler, BC, Canada, December 18, 2004. Available at http://mercurio.srv.dsi.unimi.it/esabian/Pubblicazioni/iada.pdf.
[24] Cesa-Bianchi, N., Conconi, A., and Gentile, C., On the Generalization Ability of On-Line Learning Algorithms, IEEE Trans. Inform. Theory, 2004, vol. 50, no.9, pp. 2050–2057. · Zbl 1295.68182 · doi:10.1109/TIT.2004.833339
[25] Juditsky, A. and Nemirovski, A., Functional Aggregation for Nonparametric Estimation, Ann. Statist., 2000, vol. 28, no.3, pp. 681–712. · Zbl 1105.62338 · doi:10.1214/aos/1015951994
[26] Tsybakov, A., Optimal Rates of Aggregation, Computational Learning Theory and Kernel Machines, Scholkopf, B. and Warmuth, M., Eds., Lecture Notes in Artificial Intelligence, Heidelberg: Springer, 2003, vol. 2777, pp. 303–313. · Zbl 1208.62073
[27] Vapnik, V. and Chervonenkis, A., Teoriya raspoznavaniya obrazov, Moscow: Nauka, 1974. Translated under the title Theorie der Zeichenerkennung, Berlin: Akademie-Verlag, 1979. · Zbl 0284.68070
[28] Breiman, L., Arcing the Edge, Tech. Rep. of Statist. Dept., Univ. of California, Berkeley, 1997, no. 486.
[29] Friedman, J., Hastie, T., and Tibshirani, R., Additive Logistic Regression: a Statistical View of Boosting (With Discussion and a Rejoinder by the Authors), Ann. Statist., 2000, vol. 28, no.2, pp. 337–407. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[30] Tsybakov, A., Optimal Aggregation of Classifiers in Statistical Learning, Ann. Statist., 2004, vol. 32, no.1, pp. 135–166. · Zbl 1105.62353 · doi:10.1214/aos/1079120131
[31] Tarigan, B. and van de Geer, S.A., Adaptivity of Support Vector Machines with 1 Penalty, Tech. Rep. of Math. Inst., Univ. of Leiden, Leiden, 2004, no. MI 2004-14. Available at http://www.math.leidenuniv.nl/eer/svm4.pdf.
[32] Rockafellar, R.T. and Wets, R.J.B., Variational Analysis, New York: Springer, 1998. · Zbl 0888.49001
[33] Kiwiel, K.C., Proximal Minimization Methods with Generalized Bregman Functions, SIAM J. Control Optim., 1997, vol. 35, no.4, pp. 1142–1168. · Zbl 0890.65061 · doi:10.1137/S0363012995281742
[34] Beck, A. and Teboulle, M., Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization, Oper. Research Letters, 2003, vol. 31, no.3, pp. 167–175. · Zbl 1046.90057 · doi:10.1016/S0167-6377(02)00231-6
[35] Polyak, B.T. and Tsypkin, Ya.Z., Criterial Algorithms of Stochastic Optimization, Avtom. Telemekh., 1984, no. 6. pp. 95–104 [Autom. Remote Contr. (Engl. Transl.), vol. 45, no. 6, part 2, pp. 766–774].
[36] Vajda, I., Theory of Statistical Inference and Information, Dordrecht: Kluwer, 1986. · Zbl 0609.62053
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.