×

zbMATH — the first resource for mathematics

Aggregation for Gaussian regression. (English) Zbl 1209.62065
Summary: This paper studies statistical aggregation procedures in the regression setting. A motivating factor is the existence of many different methods of estimation, leading to possibly competing estimators. We consider here three different types of aggregation: model selection (MS) aggregation, convex (C) aggregation and linear (L) aggregation. The objective of (MS) is to select the optimal single estimator from the list; that of (C) is to select the optimal convex combination of the given estimators; and that of (L) is to select the optimal linear combination of the given estimators. We are interested in evaluating the rates of convergence of the excess risks of the estimators obtained by these procedures. Our approach is motivated by recently published minimax results [A. Nemirovski, Lect. Notes Math. 1738, 85–277 (2000; Zbl 0998.62033); A. B. Tsybakov, Lect. Notes Comput. Sci. 2777, 303–313 (2003; Zbl 1208.62073)]. There exist competing aggregation procedures achieving optimal convergence rates for each of the (MS), (C) and (L) cases separately. Since these procedures are not directly comparable with each other, we suggest an alternative solution. We prove that all three optimal rates, as well as those for the newly introduced (S) aggregation (subset selection), are nearly achieved via a single “universal” aggregation procedure. The procedure consists of mixing the initial estimators with weights obtained by penalized least squares. Two different penalties are considered: one of them is of the BIC type, the second one is a data-dependent \(\ell_1\)-type penalty.

MSC:
62G08 Nonparametric regression and quantile regression
62C20 Minimax procedures in statistical decision theory
62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference
Software:
PDCO
PDF BibTeX XML Cite
Full Text: DOI arXiv
References:
[1] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19 716–723. · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[2] Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations (with discussion). J. Amer. Statist. Assoc. 96 939–967. JSTOR: · Zbl 1072.62561 · doi:10.1198/016214501753208942 · links.jstor.org
[3] Audibert, J.-Y. (2004). Aggregated estimators and empirical complexity for least square regression. Ann. Inst. H. Poincaré Probab. Statist. 40 685–736. · Zbl 1052.62037 · doi:10.1016/j.anihpb.2003.11.006 · numdam:AIHPB_2004__40_6_685_0 · eudml:77830
[4] Baraud, Y. (2000). Model selection for regression on a fixed design. Probab. Theory Related Fields 117 467–493. · Zbl 0997.62027 · doi:10.1007/s004400000058
[5] Baraud, Y. (2002). Model selection for regression on a random design. ESAIM Probab. Statist. 6 127–146. · Zbl 1059.62038 · doi:10.1051/ps:2002007 · numdam:PS_2002__6__127_0 · eudml:244664
[6] Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39 930–945. · Zbl 0818.68126 · doi:10.1109/18.256500
[7] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413. · Zbl 0946.62036 · doi:10.1007/s004400050210
[8] Bartlett, P. L., Boucheron, S. and Lugosi, G. (2000). Model selection and error estimation. In Proc. 13th Annual Conference on Computational Learning Theory 286–297. Morgan Kaufmann, San Francisco.
[9] Birgé, L. (2006). Model selection via testing: An alternative to (penalized) maximum likelihood estimators. Ann. Inst. H. Poincaré Probab. Statist. 42 273–325. · Zbl 1333.62094 · doi:10.1016/j.anihpb.2005.04.004
[10] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3 203–268. · Zbl 1037.62001 · doi:10.1007/s100970100031
[11] Birgé, L. and Massart, P. (2001). A generalized \(C_p\) criterion for Gaussian model selection. Prépublication 647, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris 6 and Paris 7. Available at www.proba.jussieu.fr/mathdoc/preprints/index.html#2001. · Zbl 1037.62001 · doi:10.1007/s100970100031
[12] Bunea, F. (2004). Consistent covariate selection and postmodel selection inference in semiparametric regression. Ann. Statist. 32 898–927. · Zbl 1092.62045 · doi:10.1214/009053604000000247
[13] Bunea, F. and Nobel, A. B. (2005). Sequential procedures for aggregating arbitrary estimators of a conditional mean. Technical Report M984, Dept. Statistics, Florida State Univ. · Zbl 1329.62359
[14] Bunea, F., Tsybakov, A. and Wegkamp, M. H. (2004). Aggregation for regression learning. Available at www.arxiv.org/abs/math/0410214. Prépublication 948, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris 6 and Paris 7. Available at hal.ccsd.cnrs.fr/ccsd-00003205.
[15] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization . École d’Eté de Probabilités de Saint-Flour 2001. Lecture Notes in Math. 1851 . Springer, Berlin. · Zbl 1076.93002 · doi:10.1007/b99352
[16] Cavalier, L., Golubev, G. K., Picard, D. and Tsybakov, A. B. (2002). Oracle inequalities for inverse problems. Ann. Statist. 30 843–874. · Zbl 1029.62032 · doi:10.1214/aos/1028674843 · euclid:aos/1028674843
[17] Chen, S., Donoho, D. and Saunders, M. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129–159. JSTOR: · Zbl 0979.94010 · doi:10.1137/S003614450037906X · links.jstor.org
[18] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition . Springer, New York. · Zbl 0853.68150
[19] Donoho, D. L., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18. · Zbl 1288.94017 · doi:10.1109/TIT.2005.860430
[20] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[21] Foster, D. and George, E. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975. · Zbl 0829.62066 · doi:10.1214/aos/1176325766
[22] Gilbert, E. N. (1952). A comparison of signalling alphabets. Bell System Tech. J. 31 504–522.
[23] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression . Springer, New York. · Zbl 1021.62024
[24] Härdle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets , Approximation and Statistical Applications. Lecture Notes in Statist. 129 . Springer, New York. · Zbl 0899.62002
[25] Juditsky, A., Nazin, A., Tsybakov, A. and Vayatis, N. (2005). Recursive aggregation of estimators by the mirror descent method with averaging. Problems Inform. Transmission 41 368–384. · Zbl 1123.62044 · doi:10.1007/s11122-006-0005-2
[26] Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric regression. Ann. Statist. 28 681–712. · Zbl 1105.62338 · doi:10.1214/aos/1015951994 · euclid:aos/1015951994
[27] Juditsky, A., Rigollet, P. and Tsybakov, A. (2005). Learning by mirror averaging. Prépublication du Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris 6 and Paris 7. Available at hal.ccsd.cnrs.fr/ccsd-00014097. · Zbl 1274.62288
[28] Kneip, A. (1994). Ordered linear smoothers. Ann. Statist. 22 835–866. · Zbl 0815.62022 · doi:10.1214/aos/1176325498
[29] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization (with discussion). Ann. Statist. 34 2593–2706. · Zbl 1118.62065 · doi:10.1214/009053606000001019
[30] Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410. · Zbl 1309.94051 · doi:10.1109/TIT.2006.878172
[31] Loubes, J.-M. and van de Geer, S. A. (2002). Adaptive estimation with soft thresholding penalties. Statist. Neerlandica 56 454–479. · Zbl 1090.62534 · doi:10.1111/1467-9574.00212
[32] Lugosi, G. and Nobel, A. (1999). Adaptive model selection using empirical complexities. Ann. Statist. 27 1830–1864. · Zbl 0962.62034 · doi:10.1214/aos/1017939241
[33] Mallows, C. L. (1973). Some comments on \(C_P\). Technometrics 15 661–675. · Zbl 0269.62061 · doi:10.2307/1267380
[34] Nemirovski, A. (2000). Topics in non-parametric statistics. Lectures on Probability Theory and Statistics ( Saint-Flour , 1998 ). Lecture Notes in Math. 1738 85–277. Springer, Berlin. · Zbl 0998.62033
[35] Osborne, M., Presnell, B. and Turlach, B. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319–337. JSTOR: · doi:10.2307/1390657 · links.jstor.org
[36] Rao, C. R. and Wu, Y. (2001). On model selection (with discussion). In Model Selection (P. Lahiri, ed.) 1–64. IMS, Beachwood, OH. · doi:10.1214/lnms/1215540960
[37] Schapire, R. E. (1990). The strength of weak learnability. Machine Learning 5 197–227. · Zbl 1142.62372
[38] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[39] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288. JSTOR: · Zbl 0850.62538 · links.jstor.org
[40] Tsybakov, A. B. (2003). Optimal rates of aggregation. In Learning Theory and Kernel Machines. Lecture Notes in Artificial Intelligence 2777 303–313. Springer, Heidelberg. · Zbl 1208.62073
[41] Tsybakov, A. B. (2004). Introduction à l’estimation non-paramétrique. Springer, Berlin. · Zbl 1029.62034
[42] Wegkamp, M. H. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252–273. · Zbl 1019.62037 · doi:10.1214/aos/1046294464
[43] Yang, Y. (2000). Combining different procedures for adaptive regression. J. Multivariate Anal. 74 135–161. · Zbl 0964.62032 · doi:10.1006/jmva.1999.1884
[44] Yang, Y. (2001). Adaptive regression by mixing. J. Amer. Statist. Assoc. 96 574–588. JSTOR: · Zbl 1018.62033 · doi:10.1198/016214501753168262 · links.jstor.org
[45] Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli 10 25–47. · Zbl 1040.62030 · doi:10.3150/bj/1077544602
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.