Sub-Gaussian estimators of the mean of a random vector. (English) Zbl 1417.62192
The mean $$\mu$$ of a random vector $$X$$ distributed in $$\mathbb{R}^d$$ is estimated based on an i.i.d. sample of $$N$$ points. If $$X$$ has a multivariate normal distribution, then the sample mean $$\overline{\mu}_N$$ is also multivariate normal, hence with probability at least $$1-\delta,$$ $\|\overline{\mu}_N - \mu\|\le \sqrt{\frac{\mathrm{Tr}(\Sigma)}{N}}+\sqrt{\frac{2\lambda_{\max}\log(\frac{1}{\delta})}{N}},$ where $$\Sigma$$ is the covariance matrix and $$\lambda_{\max}$$ stands for its largest eigenvalue; see [D. L. Hanson and F. T. Wright, Ann. Math. Stat. 42, 1079–1083 (1971; Zbl 0216.22203)]. Similar bounds hold true if $$X$$ has a sub-Gaussian distribution.
In the paper under review, a new mean estimator $$\hat{\mu}_N^{(\delta)}$$ is constructed that achieves purely sub-Gaussian performance under the minimal condition that the second moments of $$X$$ are finite. Namely, for all $$N,$$ with probability at least $$1-\delta,$$ it holds $\|\hat{\mu}_N^{(\delta)} - \mu\| \le C\left(\sqrt{\frac{\mathrm{Tr}(\Sigma)}{N}}+\sqrt{\frac{\lambda_{\max}\log(\frac{2}{\delta})}{N}}\right)$ for an explicit universal constant $$C.$$ Since the bound does not depend on the dimension $$d$$ explicitly, the same estimator may be defined for Hilbert-space valued random vectors and the bound remains valid as long as the strong second moment of $$X$$ is finite.

 62J05 Linear regression; mixed models 62G08 Nonparametric regression and quantile regression 62G35 Nonparametric robustness 62H12 Estimation in multivariate analysis
