Square root penalty: Adaption to the margin in classification and in edge estimation. (English) Zbl 1080.62047

From the paper: Consider obervations \((X_1,Y_1),\dots,(X_n,Y_n)\), where \(Y_i\) is a bounded response random variable and \(X_i\in{\mathcal X}\) is the corresponding instance. We regard \(\{(X_i,Y_i)\}^n_{i=1}\) as i.i.d. copies of a population version \((X,Y)\). The goal is to predict the response \(Y\) given the value of the instance \(X\). We consider two statistical problems: binary classification and boundary estimation in binary images (edge estimation). In the classification setup \(Y_i\in \{0,1\}\) is a label (e.g., {ill, healthy}, {white, black}, etc.), while in edge estimation \(Y_i\) can be either a label or a general bounded random variable. Most of the paper will be concerned with the model of binary classification. The results for edge estimation are quite analogous and they will be stated as corollaries.
We consider the problem of adaptation to the margin in binary classification. We suggest a penalized empirical risk minimization classifier that adaptively attains, up to a logarithmic factor, fast optimal rates of convergence for the excess risk, that is, rates that can be faster than \(n^{-1/2}\), where \(n\) is the sample size. We show that our method also gives adaptive estimators for the problem of edge estimation.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G07 Density estimation
62G08 Nonparametric regression and quantile regression
68T10 Pattern recognition, speech recognition
Full Text: DOI arXiv


[1] Audibert, J.-Y. (2004). Aggregated estimators and empirical complexity for least squares regression. Ann. Inst. H. Poincaré Probab. Statist. 40 685–736. · Zbl 1052.62037
[2] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413. · Zbl 0946.62036
[3] Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2003). Convexity, classification and risk bounds. Technical Report 638, Dept. Statistics, Univ. California, Berkeley. · Zbl 1118.62330
[4] Blanchard, G., Lugosi, G. and Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. 4 861–894. · Zbl 1083.68109
[5] Cavalier, L. and Tsybakov, A. B. (2001). Penalized blockwise Stein’s method, monotone oracles and sharp adaptive estimation. Math. Methods Statist. 10 247–282. · Zbl 1005.62027
[6] DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Springer, Berlin. · Zbl 0797.41016
[7] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York. · Zbl 0853.68150
[8] Härdle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets , Approximation and Statistical Applications . Lecture Notes in Statist. 129 . Springer, New York. · Zbl 0899.62002
[9] Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 1902–1914. · Zbl 1008.62614
[10] Koltchinskii, V. (2003). Local Rademacher complexities and oracle inequalities in risk minimization. · Zbl 1118.62065
[11] Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 1–50. · Zbl 1012.62004
[12] Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statist. 82 . Springer, New York. · Zbl 0833.62039
[13] Loubes, J.-M. and van de Geer, S. (2002). Adaptive estimation with soft thresholding penalties. Statist. Neerlandica 56 453–478. · Zbl 1090.62534
[14] Lugosi, G. and Wegkamp, M. (2004). Complexity regularization via localized random penalties. Ann. Statist. 32 1679–1697. · Zbl 1045.62060
[15] Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829. · Zbl 0961.62058
[16] Schölkopf, B. and Smola, A. (2002). Learning with Kernels . MIT Press, Cambridge, MA. · Zbl 1019.68094
[17] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166. · Zbl 1105.62353
[18] van de Geer, S. (2000). Empirical Processes in M-Estimation . Cambridge Univ. Press. · Zbl 1179.62073
[19] van de Geer, S. (2003). Adaptive quantile regression. In Recent Advances and Trends in Nonparametric Statistics (M. G. Akritas and D. N. Politis, eds.) 235–250. North-Holland, Amsterdam.
[20] Vapnik, V. N. (1998). Statistical Learning Theory . Wiley, New York. · Zbl 0935.62007
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.