×

zbMATH — the first resource for mathematics

Combining different procedures for adaptive regression. (English) Zbl 0964.62032
Summary: Given any countable collection of regression procedures (e.g., kernel, spline, wavelet, local polynomial neural nets, etc.), we show that a single adaptive procedure can be constructed to share their advantages to a great extent in terms of global squared \(L_2\) risk. The combined procedure basically pays a price only of order \(1/n\) for adaptation over the collection. An interesting consequence is that for a countable collection of classes of regression functions (possibly of completely different characteristics), a minimax-rate adaptive estimator can be constructed such that it automatically converges at the right rate for each of the classes being considered.
A demonstration is given for high-dimensional regression, for which case, to overeome the well-known curse of dimensionality in accuracy, it is advantageous to seek different ways of characterizing a high-dimensional function (e.g., using neural nets or additive modelings) to reduce the influence of input dimension in the traditional theory of approximation (e.g., in terms of series expansion). However, in general it is difficult to assess which characterization works well for the unknown regression function. Thus adaptation over different modelings is desired.
For example, we show by combining various regression procedures that a single estimator can be constructed to be minimax-rate adaptive over Besov classes of unknown smoothness and interaction order, to converge at rate \(o(n^{-1/2})\) when the regression function has a neural net representation, and at the same time to be consistent over all bounded regression functions.

MSC:
62G08 Nonparametric regression and quantile regression
62B10 Statistical aspects of information-theoretic topics
Software:
KernSmooth
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Barron, A.R, Are Bayes rules consistent in information?, (), 85-91
[2] Barron, A.R; Cover, T.M, Minimum complexity density estimation, IEEE trans. inform. theory, 37, 1034-1054, (1991) · Zbl 0743.62003
[3] Barron, A.R, Approximation and estimation bounds for artificial neural networks, Mach. learning, 14, 115-133, (1994) · Zbl 0818.68127
[4] Barron, A.R; Barron, R.L, Statistical learning networks: a unifying view, Computer science and statistics: proceeding of the 21st interface, (1988)
[5] Barron, A.R; Birgé, L; Massart, P, Risk bounds for model selection via penalization, Probab. theory related fields, 113, 301-413, (1999) · Zbl 0946.62036
[6] Breiman, L; Friedman, J.H; Olshen, R; Stone, C.J, Classification and regression trees, (1984), Wadsworth Belmont · Zbl 0541.62042
[7] Brown, L.D; Low, M.G, A constrained risk inequality with applications to nonparametric functional estimation, Ann. statist., 24, 2524-2535, (1996) · Zbl 0867.62023
[8] Buja, A; Hastie, T; Tibshirani, R, Linear smoothing and additive models, Ann. statist., 17, 453-555, (1989) · Zbl 0689.62029
[9] Catoni, O, The mixture approach to universal model selection, Technical report, (1997) · Zbl 0928.62033
[10] Clarke, B; Barron, A.R, Information-theoretic asymptotics of Bayes methods, IEEE trans. inform. theory, 36, 453-471, (1990) · Zbl 0709.62008
[11] Delyon, B; Juditsky, A, Wavelet estimators, global error measures revisited, Technical report, (1994)
[12] DeVore, R.A; Lorentz, G.G, Constructive approximation, (1993), Springer-Verlag New York
[13] Devroye, L.P; Lugosi, G, Nonparametric universal smoothing factors, kernel complexity, and yatracos classes, Ann. statist., 25, 2626-2637, (1997) · Zbl 0897.62035
[14] Devroye, L.P; Wagner, T.J, Distribution-free consistency results in nonparametric discrimination and regression function estimation, Ann. statist., 8, 231-239, (1980) · Zbl 0431.62025
[15] Donoho, D.L, CART and best-ortho-basis: A connection, Ann. statist., 25, 1870-1911, (1997) · Zbl 0942.62044
[16] Donoho, D.L; Johnstone, I.M, Minimax estimation via wavelet shrinkage, Ann. statist., 26, 879-921, (1998) · Zbl 0935.62041
[17] Donoho, D.L; Johnstone, I.M; Kerkyacharian, G; Picard, D, Wavelet shrinkage: asymptopia?, J. roy. statist. soc. ser. B, 57, 301-369, (1995) · Zbl 0827.62035
[18] Duan, N; Li, K.-C, Slicing regression, a link-free regression method, Ann. statist., 19, 505-530, (1991) · Zbl 0738.62070
[19] Efroimovich, S.Yu, Nonparametric estimation of a density of unknown smoothness, Theory probab. appl., 30, 557-568, (1985) · Zbl 0593.62034
[20] Efroimovich, S.Yu; Pinsker, M.S, A self-educating nonparametric filtration algorithm, Automat. remote control, 45, 58-65, (1984)
[21] Fan, J; Gijbels, I, Local polynomial modeling and its applications, (1996), Chapman & Hall London · Zbl 0873.62037
[22] Friedman, J, Multivariate adaptive regression splines, Ann. statist., 19, 1-67, (1991) · Zbl 0765.62064
[23] Friedman, J; Stuetzel, W, Projection pursuit regression, J. amer. statist. assoc., 76, 817-823, (1981)
[24] Goldenshluger, A; Nemirovski, A, Adaptive de-noising of signals satisfying differential inequalities, IEEE trans. inform. theory, 43, 872-889, (1997) · Zbl 0879.94006
[25] Golubev, G.K; Nussbaum, M, Adaptive spline estimates for nonparametric regression models, Theory probab. appl., 37, 521-529, (1992) · Zbl 0787.62044
[26] Härdle, W; Marron, J.S, Optimal bandwidth selection in nonparametric regression function estimation, Ann. statist., 13, 1465-1481, (1985) · Zbl 0594.62043
[27] Juditsky, A; Nemirovski, A, Functional aggregation for nonparametric estimation, (1996), IRISA
[28] Lepski, O.V, Asymptotically minimax adaptive estimation I: upper bounds. optimally adaptive estimates, Theory probab. appl., 36, 682-697, (1991) · Zbl 0776.62039
[29] Lepski, O.V; Mammen, E; Spokoiny, V.G, Ideal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selection, Ann. statist., 25, 929-947, (1997) · Zbl 0885.62044
[30] Lugosi, G; Nobel, A, Adaptive model selection using empirical complexities, Ann. statist., (2000)
[31] Mammen, E; van de Geer, S, Locally adaptive regression splines, Ann. statist., 25, 387-413, (1997) · Zbl 0871.62040
[32] Nicoleris, T; Yatracos, Y.G, Rate of convergence of estimates, Kolmogrov’s entropy and the dimensionality reduction principle in regression, Ann. statist., 25, 2493-2511, (1991) · Zbl 0909.62063
[33] Stone, C.J, Additive regression and other nonparametric models, Ann. statist., 13, 689-705, (1985) · Zbl 0605.62065
[34] Stone, C.J, The use of polynomial splines and their tensor products in multivariate function estimation, Ann. statist., 22, 118-184, (1994) · Zbl 0827.62038
[35] Stone, C.J; Hansen, M.H; Kooperberg, C; Truong, Y, Polynomial splines and their tensor products in extended linear modeling, Ann. statist., 25, 1371-1470, (1997) · Zbl 0924.62036
[36] Triebel, H, Theory of function spaces II, (1992), Birkhäuser Basel/Boston · Zbl 0778.46022
[37] Tsybakov, A.B, Pointwise and sup-norm adaptive signal estimation on the Sobolev classes, (1995) · Zbl 0933.62028
[38] Wahba, G, Spline models for observational data, (1990), SIAM Philadelphia · Zbl 0813.62001
[39] Wand, M.P; Jones, M.C, Kernel smoothing, (1995), Chapman & Hall London
[40] Yang, Y, Minimax optimal density estimation, (May 1996), Yale UniversityDepartment of Statistics
[41] Yang, Y, Technical report, (1997)
[42] Yang, Y, Model selection for nonparametric regression, Statist. sinica, 9, 475-499, (1999) · Zbl 0921.62051
[43] Yang, Y, Mixing strategies for density estimation, Ann. statist., (1999)
[44] Yang, Y, Regression with multiple models: selecting or mixing?, Technical report, (1999)
[45] Yang, Y; Barron, A.R, An asymptotic property of model selection criteria, IEEE trans. inform. theory, 44, 95-116, (1998) · Zbl 0949.62041
[46] Yang, Y; Barron, A.R, Information-theoretic determination of minimax rates of convergence, Ann. statist., (1999) · Zbl 0978.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.