Rademacher processes and bounding the risk of function learning. (English) Zbl 1106.68385

Giné, Evarist (ed.) et al., High dimensional probability II. 2nd international conference, Univ. of Washington, DC, USA, August 1–6, 1999. Boston, MA: Birkhäuser (ISBN 0-8176-4160-2). Prog. Probab. 47, 443-457 (2000).
Let \((S,A)\) be a measurable space and let \(F\) be a class of \(A\)-measurable functions from \(S\) into \([0,1]\). Denote by \(P(S)\) the set of all probability measures on \((S,A)\) and let \(f_0\in F\) be an unknown target function. Given a measure \(P\in P(S)\) (also unknown) let \((X_1,\dots,X_n)\) be an i.i.d. sample in \((S,A)\) with common distribution \(P\). In computer learning theory, the problem of estimating \(f_0\), based on the labeled sample \((X_1,Y_1),\dots,(X_n,Y_n)\), where \(Y_j=f_0(X_j)\), \(j=1,2,\dots,n\), is referred to as a function learning problem. The goal of function learning is to find an estimate \(\hat f_n=\hat f_n((X_1,Y_1),\dots,(X_n,Y_n))\) of the unknown target function such that the \(L_1\)-distance between \(\hat f_n\) and \(f_n\) becomes small with high probability as soon as the sample size \(n\) becomes large enough. The \(L_1\)-distance \(P|\hat f_n-f_0|\) is often called the risk of the estimate \(\hat f_n\). The authors construct data dependent upper bounds on the risk in function learning problems. These bounds are based on local norms of the Rademacher process indexed by the underlying function class, and they do not require prior knowledge about the distribution of training samples or any specific properties of the function class. By using concentration inequalities of Talagrand’s type for empirical and Rademacher processes it is shown that the bounds hold with a high probability that decreases exponentially fast when the sample size \(n\) grows. The authors consider a couple of important examples in which these bounds give nearly an optimal rate of convergence of the risk to 0 as \(n\to\infty\).
For the entire collection see [Zbl 0948.00040].


68T05 Learning and adaptive systems in artificial intelligence
62G05 Nonparametric estimation
68Q32 Computational learning theory
Full Text: arXiv