# zbMATH — the first resource for mathematics

Local Rademacher complexities. (English) Zbl 1083.62034
From the paper: Estimating the performance of statistical procedures is useful for providing a better understanding of the factors that influence their behavior, as well as for suggesting ways to improve them. Although asymptotic analysis is a crucial first step toward understanding the behavior, finite sample error bounds are of more value as they allow the design of model selection (or parameter tuning) procedures. These error bounds typically have the following form: with high probability, the error of the estimator (typically a function in a certain class) is bounded by an empirical estimate of error plus a penalty term depending on the complexity of the class of functions that can be chosen by the algorithm. The differences between the true and empirical errors of functions in that class can be viewed as an empirical process. Many tools have been developed for understanding the behavior of such objects, and especially for evaluating their suprema – which can be thought of as a measure of how hard it is to estimate functions in the class at hand. The goal is thus to obtain the sharpest possible estimates on the complexity of function classes. A problem arises since the notion of complexity might depend on the (unknown) underlying probability measure according to which the data is produced. Distribution-free notions of the complexity, such as the Vapnik-Chervonenkis dimension or the metric entropy, typically give conservative estimates. Distribution-dependent estimates, based for example on entropy numbers in the $$L_2(P)$$ distance, where $$P$$ is the underlying distribution, are not useful when $$P$$ is unknown. Thus, it is desirable to obtain data-dependent estimates which can readily be computed from the sample.
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.

##### MSC:
 62G08 Nonparametric regression and quantile regression 68Q32 Computational learning theory 68Q25 Analysis of algorithms and problem complexity 65Y20 Complexity and performance of numerical algorithms
Full Text:
##### References:
  Bartlett, P. L., Boucheron, S. and Lugosi, G. (2002). Model selection and error estimation. Machine Learning 48 85–113. · Zbl 0998.68117  Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2005). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. · Zbl 1118.62330  Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3 463–482. · Zbl 1084.68549  Bartlett, P. L. and Mendelson, S. (2003). Empirical minimization. Probab. Theory Related Fields . To appear. Available at www.stat.berkeley.edu/ bartlett/papers/ bm-em-03.pdf. · Zbl 1142.62348  Boucheron, S., Lugosi, G. and Massart, P. (2000). A sharp concentration inequality with applications. Random Structures Algorithms 16 277–292. · Zbl 0954.60008  Boucheron, S., Lugosi, G. and Massart, P. (2003). Concentration inequalities using the entropy method. Ann. Probab. 31 1583–1614. · Zbl 1051.60020  Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–500. · Zbl 1001.60021  Bousquet, O. (2003). Concentration inequalities for sub-additive functions using the entropy method. In Stochastic Inequalities and Applications (E. Giné, C. Houdré and D. Nualart, eds.) 213–247. Birkhäuser, Boston. · Zbl 1037.60015  Bousquet, O., Koltchinskii, V. and Panchenko, D. (2002). Some local measures of complexity of convex hulls and generalization bounds. Computational Learning Theory. Lecture Notes in Artificial Intelligence 2375 59–73. Springer, Berlin. · Zbl 1050.68055  Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition . Springer, New York. · Zbl 0853.68150  Dudley, R. M. (1999). Uniform Central Limit Theorems . Cambridge Univ. Press. · Zbl 0951.60033  Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression . Springer, New York. · Zbl 1021.62024  Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inform. and Comput. 100 78–150. · Zbl 0762.68050  Haussler, D. (1995). Sphere packing numbers for subsets of the Boolean $$n$$-cube with bounded Vapnik–Chervonenkis dimension. J. Combin. Theory Ser. A 69 217–232. · Zbl 0818.60005  Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 1902–1914. · Zbl 1008.62614  Koltchinskii, V. and Panchenko, D. (2000). Rademacher processes and bounding the risk of function learning. In High Dimensional Probability II (E. Giné, D. M. Mason and J. A. Wellner, eds.) 443–459. Birkhäuser, Boston. · Zbl 1106.68385  Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces : Isoperimetry and Processes . Springer, New York. · Zbl 0748.60004  Lee, W. S., Bartlett, P. L. and Williamson, R. C. (1998). The importance of convexity in learning with squared loss. IEEE Trans. Inform. Theory 44 1974–1980. · Zbl 0935.68091  Lugosi, G. and Wegkamp, M. (2004). Complexity regularization via localized random penalties. Ann. Statist. 32 1679–1697. · Zbl 1045.62060  Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829. · Zbl 0961.62058  Massart, P. (2000). About the constants in Talagrand’s concentration inequalities for empirical processes. Ann. Probab. 28 863–884. · Zbl 1140.60310  Massart, P. (2000). Some applications of concentration inequalities to statistics. Probability theory. Ann. Fac. Sci. Toulouse Math. ( 6 ) 9 245–303. · Zbl 0986.62002  McDiarmid, C. (1998). Concentration. In Probabilistic Methods for Algorithmic Discrete Mathematics (M. Habib, C. McDiarmid, J. Ramirez-Alfonsin and B. Reed, eds.) 195–248. Springer, New York. · Zbl 0927.60027  Mendelson, S. (2002). Geometric parameters of kernel machines. Computational Learning Theory. Lecture Notes in Artificial Intelligence 2375 29–43. Springer, Berlin. · Zbl 1050.68070  Mendelson, S. (2002). Rademacher averages and phase transitions in Glivenko–Cantelli classes. IEEE Trans. Inform. Theory 48 251–263. · Zbl 1059.60027  Mendelson, S. (2002). Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 1977–1991. · Zbl 1061.68128  Mendelson, S. (2003). A few notes on statistical learning theory. Advanced Lectures on Machine Learning . Lecture Notes in Comput. Sci. 2600 1–40. Springer, New York. · Zbl 1015.00030  Pollard, D. (1984). Convergence of Stochastic Processes . Springer, Berlin. · Zbl 0544.60045  Rio, E. (2001). Une inégalité de Bennett pour les maxima de processus empiriques. Ann. Inst. H. Poincaré Probab. Statist. 38 1053–1057. · Zbl 1014.60011  Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 28–76. JSTOR: · Zbl 0798.60051  van de Geer, S. (1987). A new approach to least-squares estimation, with applications. Ann. Statist. 15 587–602. JSTOR: · Zbl 0625.62046  van de Geer, S. (2000). Empirical Processes in M-Estimation . Cambridge Univ. Press. · Zbl 1179.62073  van der Vaart, A. (1998). Asymptotic Statistics . Cambridge Univ. Press. · Zbl 0910.62001  van der Vaart, A. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. With Applications of Statistics . Springer, New York. · Zbl 0862.60002  Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264–280. · Zbl 0247.60005
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.