×

Set structured global empirical risk minimizers are rate optimal in general dimensions. (English) Zbl 1478.62081

Summary: Entropy integrals are widely used as a powerful empirical process tool to obtain upper bounds for the rates of convergence of global empirical risk minimizers (ERMs), in standard settings such as density estimation and regression. The upper bound for the convergence rates thus obtained typically matches the minimax lower bound when the entropy integral converges, but admits a strict gap compared to the lower bound when it diverges. L. Birgé and P. Massart [Probab. Theory Relat. Fields 97, No. 1–2, 113–150 (1993; Zbl 0805.62037)] provided a striking example showing that such a gap is real with the entropy structure alone: for a variant of the natural Hölder class with low regularity, the global ERM actually converges at the rate predicted by the entropy integral that substantially deviates from the lower bound. The counter-example has spawned a long-standing negative position on the use of global ERMs in the regime where the entropy integral diverges, as they are heuristically believed to converge at a suboptimal rate in a variety of models.
The present paper demonstrates that this gap can be closed if the models admit certain degree of “set structures” in addition to the entropy structure. In other words, the global ERMs in such set structured models will indeed be rate-optimal, matching the lower bound even when the entropy integral diverges. The models with set structures we investigate include (i) image and edge estimation, (ii) binary classification, (iii) multiple isotonic regression, (iv) \(s\)-concave density estimation, all in general dimensions when the entropy integral diverges. Here, set structures are interpreted broadly in the sense that the complexity of the underlying models can be essentially captured by the size of the empirical process over certain class of measurable sets, for which matching upper and lower bounds are obtained to facilitate the derivation of sharp convergence rates for the associated global ERMs.

MSC:

62G05 Nonparametric estimation
62G07 Density estimation
62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference

Citations:

Zbl 0805.62037
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Baraud, Y. (2016). Bounding the expectation of the supremum of an empirical process over a (weak) VC-major class. Electron. J. Stat. 10 1709-1728. · Zbl 1385.60038 · doi:10.1214/15-EJS1055
[2] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413. · Zbl 0946.62036 · doi:10.1007/s004400050210
[3] Birgé, L. (1983). Approximation dans les espaces métriques et théorie de l’estimation. Z. Wahrsch. Verw. Gebiete 65 181-237. · Zbl 0506.62026 · doi:10.1007/BF00532480
[4] Birgé, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113-150. · Zbl 0805.62037 · doi:10.1007/BF01199316
[5] Brunel, V.-E. (2013). Adaptive estimation of convex polytopes and convex sets from noisy data. Electron. J. Stat. 7 1301-1327. · Zbl 1336.62127 · doi:10.1214/13-EJS804
[6] Carpenter, T., Diakonikolas, I., Sidiropoulos, A. and Stewart, A. (2018). Near-optimal sample complexity bounds for maximum likelihood estimation of multivariate log-concave densities. In Conference on Learning Theory 1-29.
[7] Chatterjee, S. (2014). A new perspective on least squares under convex constraint. Ann. Statist. 42 2340-2381. · Zbl 1302.62053 · doi:10.1214/14-AOS1254
[8] Chatterjee, S., Guntuboyina, A. and Sen, B. (2018). On matrix estimation under monotonicity constraints. Bernoulli 24 1072-1100. · Zbl 1419.62135 · doi:10.3150/16-BEJ865
[9] Dagan, Y. and Kur, G. (2019). The log-concave maximum likelihood estimator is optimal in high dimensions. Preprint. Available at arXiv:1903.05315.
[10] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York) 31. Springer, New York. · Zbl 0853.68150 · doi:10.1007/978-1-4612-0711-5
[11] Doss, C. R. and Wellner, J. A. (2016). Global rates of convergence of the MLEs of log-concave and \(s\)-concave densities. Ann. Statist. 44 954-981. · Zbl 1338.62101 · doi:10.1214/15-AOS1394
[12] Dudley, R. M. (1982). Empirical and Poisson processes on classes of sets or functions too large for central limit theorems. Z. Wahrsch. Verw. Gebiete 61 355-368. · Zbl 0479.60029 · doi:10.1007/BF00539835
[13] Dudley, R. M. (2014). Uniform Central Limit Theorems, 2nd ed. Cambridge Studies in Advanced Mathematics 142. Cambridge Univ. Press, New York.
[14] Gao, F. and Wellner, J. A. (2007). Entropy estimate for high-dimensional monotonic functions. J. Multivariate Anal. 98 1751-1764. · Zbl 1221.62008 · doi:10.1016/j.jmva.2006.09.003
[15] Giné, E. and Koltchinskii, V. (2006). Concentration inequalities and asymptotic results for ratio type empirical processes. Ann. Probab. 34 1143-1216. · Zbl 1152.60021 · doi:10.1214/009117906000000070
[16] Giné, E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge Univ. Press, New York. · Zbl 1358.62014 · doi:10.1017/CBO9781107337862
[17] Grenander, U. (1981). Abstract Inference. Wiley, New York. · Zbl 0505.62069
[18] Guntuboyina, A. (2012). Optimal rates of convergence for convex set estimation from support functions. Ann. Statist. 40 385-411. · Zbl 1246.62085 · doi:10.1214/11-AOS959
[19] Han, Q. (2021). Supplement to “Set structured global empirical risk minimizers are rate optimal in general dimensions.” https://doi.org/10.1214/21-AOS2049SUPP.
[20] Han, Q., Wang, T., Chatterjee, S. and Samworth, R. J. (2019). Isotonic regression in general dimensions. Ann. Statist. 47 2440-2471. · Zbl 1437.62124 · doi:10.1214/18-AOS1753
[21] Han, Q. and Wellner, J. A. (2016). Approximation and estimation of \(s\)-concave densities via Rényi divergences. Ann. Statist. 44 1332-1359. · Zbl 1338.62105 · doi:10.1214/15-AOS1408
[22] Han, Q. and Wellner, J. A. (2019). Convergence rates of least squares regression estimators with heavy-tailed errors. Ann. Statist. 47 2286-2319. · Zbl 1466.60033 · doi:10.1214/18-AOS1748
[23] Kim, A. K. H. and Samworth, R. J. (2016). Global rates of convergence in log-concave density estimation. Ann. Statist. 44 2756-2779. · Zbl 1360.62157 · doi:10.1214/16-AOS1480
[24] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593-2656. · Zbl 1118.62065 · doi:10.1214/009053606000001019
[25] Korostelëv, A. P. and Tsybakov, A. B. (1992). Asymptotically minimax image reconstruction problems. In Topics in Nonparametric Estimation. Adv. Soviet Math. 12 45-86. Amer. Math. Soc., Providence, RI. · Zbl 0751.60045 · doi:10.1137/1136015
[26] Korostelëv, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statistics 82. Springer, New York. · Zbl 0833.62039 · doi:10.1007/978-1-4612-2712-0
[27] Kur, G., Gao, F., Guntuboyina, A. and Sen, B. (2020). Convex regression in multidimensions: Suboptimality of least squares estimators. Preprint. Available at arXiv:2006.02044.
[28] LeCam, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1 38-53. · Zbl 0255.62006
[29] Lecué, G. (2007). Simultaneous adaptation to the margin and to complexity in classification. Ann. Statist. 35 1698-1721. · Zbl 1209.62146 · doi:10.1214/009053607000000055
[30] Mammen, E. and Tsybakov, A. B. (1995). Asymptotical minimax recovery of sets with smooth boundaries. Ann. Statist. 23 502-524. · Zbl 0834.62038 · doi:10.1214/aos/1176324533
[31] Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808-1829. · Zbl 0961.62058 · doi:10.1214/aos/1017939240
[32] Massart, P. and Nédélec, É. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326-2366. · Zbl 1108.62007 · doi:10.1214/009053606000000786
[33] Ossiander, M. (1987). A central limit theorem under metric entropy with \[{L_2}\] bracketing. Ann. Probab. 15 897-919. · Zbl 0665.60036
[34] Pollard, D. (2002). Maximal inequalities via bracketing with adaptive truncation Ann. Inst. Henri Poincaré Probab. Stat. 38 1039-1052. · Zbl 1019.60015 · doi:10.1016/S0246-0203(02)01138-X
[35] Robertson, T., Wright, F. T. and Dykstra, R. L. (1988). Order Restricted Statistical Inference. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, Chichester. · Zbl 0645.62028
[36] Saumard, A. (2012). Optimal upper and lower bounds for the true and empirical excess risks in heteroscedastic least-squares regression. Electron. J. Stat. 6 579-655. · Zbl 1334.62068 · doi:10.1214/12-EJS679
[37] Seregin, A. and Wellner, J. A. (2010). Nonparametric estimation of multivariate convex-transformed densities. Ann. Statist. 38 3751-3781. · Zbl 1204.62058 · doi:10.1214/10-AOS840
[38] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135-166. · Zbl 1105.62353 · doi:10.1214/aos/1079120131
[39] van de Geer, S. (1987). A new approach to least-squares estimation, with applications. Ann. Statist. 15 587-602. · Zbl 0625.62046 · doi:10.1214/aos/1176350362
[40] van de Geer, S. (1990). Estimating a regression function. Ann. Statist. 18 907-924. · Zbl 0709.62040 · doi:10.1214/aos/1176347632
[41] van de Geer, S. (1993). Hellinger-consistency of certain nonparametric maximum likelihood estimators. Ann. Statist. 21 14-44. · Zbl 0779.62033 · doi:10.1214/aos/1176349013
[42] van de Geer, S. (1995). The method of sieves and minimum contrast estimators. Math. Methods Statist. 4 20-38. · Zbl 0831.62029
[43] van de Geer, S. and Wainwright, M. J. (2017). On concentration for (regularized) empirical risk minimization. Sankhya A 79 159-200. · Zbl 1380.62085 · doi:10.1007/s13171-017-0111-9
[44] van de Geer, S. A. (2000). Applications of Empirical Process Theory. Cambridge Series in Statistical and Probabilistic Mathematics 6. Cambridge Univ. Press, Cambridge. · Zbl 0953.62049
[45] Van der Vaart, A. (1996). Efficient maximum likelihood estimation in semiparametric mixture models. Ann. Statist. 24 862-878. · Zbl 0860.62029 · doi:10.1214/aos/1032894470
[46] van der Vaart, A. (1996). New Donsker classes. Ann. Probab. 24 2128-2140. · Zbl 0872.60023 · doi:10.1214/aop/1041903221
[47] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, New York. · Zbl 0862.60002 · doi:10.1007/978-1-4757-2545-2
[48] Wong, W. H. and Shen, X. (1995). Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Ann. Statist. 23 339-362. · Zbl 0829.62002 · doi:10.1214/aos/1176324524
[49] Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564-1599 · Zbl 0978.62008 · doi:10.1214/aos/1017939142
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.