Consistency of logistic classifier in abstract Hilbert spaces. (English) Zbl 1418.62075

Let \(E\) be an infinite-dimensional separable Hilbert space. Let \(X\) denote an \(E\)-valued random variable and \(Y\) a further random variable, taking values in \(\{-1, +1\}\), where \(X\) and \(Y\) are defined on the same probability space \((\Omega, \mathcal{F}, \mathbb{P})\). Assume a logistic model of the form \[ \mathbb{P}(Y = +1 | X = x) = p_{\theta_0}(x) = \frac{1}{1 + e^{-\langle \theta_0, x \rangle}}, \] where \(\langle \cdot, \cdot \rangle\) denotes the inner product in \(E\) and \(\theta_0\) is an unknown element of \(E\). Furthermore, let \((X_1, Y_1), \ldots, (X_n, Y_n)\) be a sample of independent \(E \times \{-1, +1\}\)-valued observables, such that \((X_1, Y_1)\) is distributed as \((X, Y)\).
The authors are concerned with estimating \(\theta_0\) by means of the quasi-maximum likelihood method, along some fixed sequence \((E_k)_{k}\) of linear subspaces of \(E\) with \(\dim E_k = k\), where \(k = k_n\) depends on the sample size \(n\). Two sets of conditions are derived for the consistency of the resulting estimator \(\hat{\theta}_{k_n, n}\) as \(n\) tends to infinity. The first set of conditions refers to the distribution of \(X\), and the second set of conditions refers to the growth rate of \(k_n\) as a function of \(n\). Finally, a simulation study is presented to illustrate the necessity of the conditions.


62F12 Asymptotic properties of parametric estimators
62H30 Classification and discrimination; cluster analysis (statistical aspects)
60B12 Limit theorems for vector-valued random variables (infinite-dimensional case)
62J12 Generalized linear models (logistic models)


fda (R)
