Learning by mirror averaging. (English) Zbl 1274.62288

Summary: Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion. We define our aggregate by a simple recursive procedure which solves an auxiliary stochastic linear programming problem related to the original nonlinear one and constitutes a special case of the mirror averaging algorithm. We show that the aggregate satisfies sharp oracle inequalities under some general assumptions. The results are applied to several problems including regression, classification and density estimation.


62G08 Nonparametric regression and quantile regression
62C20 Minimax procedures in statistical decision theory
62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference
Full Text: DOI arXiv


[1] Bartlett, P. L., Boucheron, S. and Lugosi, G. (2002). Model selection and error estimation. Machine Learning 48 85-113. · Zbl 0998.68117
[2] Belomestny, D. and Spokoiny, V. (2007). Spatial aggregation of local likelihood estimates with applications to classification. Ann. Statist. 35 2287-2311. · Zbl 1126.62021
[3] Ben-Tal, A. and Nemirovski, A. S. (1999). The conjugate barrier mirror descent method for non-smooth convex optimization. MINERVA Optimization Center Report, Faculty of Industrial Engineering and Management, Technion-Israel Institute of Technology, Haifa. Available at http://iew3.technion.ac.il/Labs/Opt/opt/Pap/CP_MD.pdf.
[4] Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: Some recent advances. ESAIM Probab. Statist. 9 323-375. · Zbl 1136.62355
[5] Bunea, F. and Nobel, A. (2005). Sequential procedures for aggregating arbitrary estimators of a conditional mean. Manuscript. Available at http://www.stat.fsu.edu/ flori. · Zbl 1329.62359
[6] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065
[7] Catoni, O. (1997). A mixture approach to universal model selection. Preprint LMENS-97-30, Ecole Normale Supérieure, Paris. · Zbl 0928.62033
[8] Catoni, O. (1999). “Universal” aggregation rules with exact bias bounds. Preprint 510, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris 6 and Paris 7. Available at http://www.proba.jussieu.fr/mathdoc/preprints/index.html#1999.
[9] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Ecole d’Eté de Probabilités de Saint-Flour XXXI-2001. Lecture Notes in Math. 1851 . Springer, New York. · Zbl 1076.93002
[10] Cesa-Bianchi, N. and Lugosi, G. (2006) Prediction , Learning , and Games . Cambridge Univ. Press. · Zbl 1114.91001
[11] Haussler, D., Kivinen, J. and Warmuth, M. K. (1998). Sequential prediction if individual sequences under general loss functions. IEEE Trans. Inform. Theory 44 1906-1925. · Zbl 1026.68579
[12] Juditsky, A., Nazin, A., Tsybakov, A. and Vayatis, N. (2005). Recursive aggregation of estimators via the mirror descent algorithm with averaging. Problems Inform. Transmission 41 368-384. · Zbl 1123.62044
[13] Kivinen, J. and Warmuth, M. K. (1999). Averaging expert predictions. In Proc. Fourth. European Conf. on Computational Learning Theory (H. U. Simon and P. Fischer, eds.) 153-167. Springer, Berlin.
[14] Lee, W. S., Bartlett, P. L. and Williamson, R.C. (1996). Efficient agnostic learning with bounded fan-in. IEEE Trans. Inform. Theory 42 2118-2132. · Zbl 0874.68253
[15] Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396-3410. · Zbl 1309.94051
[16] Lugosi, G. and Wegkamp, M. (2004). Complexity regularization via localized random penalties. Ann. Statist. 32 1679-1697. · Zbl 1045.62060
[17] Massart, P. (2007). Concentration Inequalities and Model Selection. Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003. Lecture Notes in Math. 1896 . Springer, New York. · Zbl 1170.60006
[18] Nemirovski, A. (2000). Topics in non-parametric statistics. Ecole d’Eté de Probabilités de Saint-Flour XXVIII-1998. Lecture Notes in Math. 1738 85-277. Springer, New York. · Zbl 0998.62033
[19] Nemirovski, A. S. and Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization , Wiley, Chichester. · Zbl 0501.90062
[20] Nesterov, Y. (2007). Primal-dual subgradient methods for convex problems. Mathematical Programming . Published online DOI: 10.1007/s10107-007-0149-x. Also available as CORE Discussion Paper n. 2005/67, Center for Operation Research and Econometrics, Louvain-la-Neuve, Belgium, 2005.
[21] Petrov, V. V. (1995) Limit Theorems of Probability Theory . Clarendon Press, Oxford. · Zbl 0826.60001
[22] Samarov, A. and Tsybakov, A. (2007). Aggregation of density estimators and dimension reduction. In Advances in Statistical Modeling and Inference. Essays in Honor of Kjell A. Doksum (V. Nair, ed.) 233-251. World Scientific, Singapore.
[23] Singer, A. and Feder, M. (1999). Universal linear prediction by model order weighting. IEEE Trans. Signal Process. 47 2685-2699. · Zbl 1010.94005
[24] Tsybakov, A. (2003). Optimal rates of aggregation. In Computational Learning Theory and Kernel Machines , COLT-2003 (B. Schölkopf and M. Warmuth, eds.) 303-313. Springer, Heidelberg. · Zbl 1208.62073
[25] Tsybakov, A. (2004). Introduction à l’estimation non paramétrique . Springer, Berlin. · Zbl 1029.62034
[26] Vovk, V. (1990). Aggregating strategies. In Proc. 3rd Annual Workshop on Computational Learning Theory 372-383. Morgan Kaufman, San Mateo.
[27] Vovk, V. (2001). Competitive on-line statistics. Internat. Statist. Rev. 69 213-248. · Zbl 1211.62200
[28] Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252-273. · Zbl 1019.62037
[29] Yang, Y. (2000). Mixing strategies for density estimation. Ann. Statist. 28 75-87. · Zbl 1106.62322
[30] Zhang, T. (2006). From epsilon-entropy to KL-complexity: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180-2210. · Zbl 1106.62005
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.