Effect of high dimension: by an example of a two sample problem. (English) Zbl 0848.62030

Stat. Sin. 6, No. 2, 311-329 (1996); correction ibid. 28, No. 2, 1105 (2018).
Summary: With the rapid development of modern computing techniques, statisticians are dealing with data with much higher dimension. Consequently, due to their loss of accuracy or power, some classical statistical inferences are being challenged by non-exact approaches. The purpose of this paper is to point out and briefly analyze such a phenomenon and to encourage statisticians to reexamine classical statistical approaches when they are dealing with high dimensional data. As an example, we derive the asymptotic power of the classical Hotelling’s \(T^2\) test and Dempster’s non-exact test [see A. P. Dempster, Ann. Math. Stat. 29, 995-1010 (1958); Biometrics 16, 41-50 (1960)] for a two-sample problem. Also, an asymptotically normally distributed test statistic is proposed.
Our results show that both Dempster’s non-exact test and the new test have higher power than Hotelling’s test when the data dimension is proportionally close to the within sample degrees of freedom. Although our new test has an asymptotic power function similar to Dempster’s, it does not rely on the normality assumption. Some simulation results are presented which show that the non-exact tests are more powerful than Hotelling’s test even for moderately large dimensions and sample sizes.


62H15 Hypothesis testing in multivariate analysis
62F05 Asymptotic properties of parametric tests