Distribution and correlation-free two-sample test of high-dimensional means. (English) Zbl 1454.62157

The authors propose a two-sample test for comparing high-dimensional means that requires neither distributional nor correlational assumptions, besides some weak conditions on the moments and tail properties of the elements in the random vectors. This two-sample test called “distribution and correlation-free (DCF) two-sample mean test” is based on a nontrivial extension of the one-sample central limit theorem. The proposed test does not require the independently and identically distributed assumption. Weaker moments and tail conditions are posed than in the existing methods. The test allows highly unequal sample sizes. It has consistent power behavior under fairly general alternative. Simulated and real data examples demonstrate good numerical performance compared with existing methods.


62H05 Characterization and structure theory for multivariate probability distributions; copulas
62H15 Hypothesis testing in multivariate analysis
62F05 Asymptotic properties of parametric tests
60F05 Central limit and other weak theorems
Full Text: DOI arXiv Euclid


[1] Ayyala, D. N., Park, J. and Roy, A. (2017). Mean vector testing for high-dimensional dependent observations. J. Multivariate Anal. 153 136-155. · Zbl 1351.62112
[2] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311-329. · Zbl 0848.62030
[3] Cai, T. T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349-372.
[4] Chang, J., Zheng, C., Zhou, W.-X. and Zhou, W. (2017). Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity. Biometrics 73 1300-1310. · Zbl 1405.62162
[5] Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808-835. · Zbl 1183.62095
[6] Chen, X. (2018). Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. Ann. Statist. 46 642-678. · Zbl 1396.62019
[7] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786-2819. · Zbl 1292.62030
[8] Chernozhukov, V., Chetverikov, D. and Kato, K. (2015). Comparison and anti-concentration bounds for maxima of Gaussian random vectors. Probab. Theory Related Fields 162 47-70. · Zbl 1319.60072
[9] Chernozhukov, V., Chetverikov, D. and Kato, K. (2017). Central limit theorems and bootstrap in high dimensions. Ann. Probab. 45 2309-2352. · Zbl 1377.60040
[10] Feng, L., Zou, C., Wang, Z. and Zhu, L. (2015). Two-sample Behrens-Fisher problem for high-dimensional data. Statist. Sinica 25 1297-1312. · Zbl 1377.62144
[11] Gregory, K. B., Carroll, R. J., Baladandayuthapani, V. and Lahiri, S. N. (2015). A two-sample test for equality of means in high dimension. J. Amer. Statist. Assoc. 110 837-849. · Zbl 1373.62274
[12] Hu, J., Bai, Z., Wang, C. and Wang, W. (2017). On testing the equality of high dimensional mean vectors with unequal covariance matrices. Ann. Inst. Statist. Math. 69 365-387. · Zbl 1396.62106
[13] Hussain, L., Aziz, W., Nadeem, S. A., Shah, S. A. and Majid, A. (2015). Electroencephalography (EEG) analysis of alcoholic and control subjects using multiscale permutation entropy. J. Multidiscip. Eng. Sci. Technol. 1 3159-0040.
[14] Park, J. and Ayyala, D. N. (2013). A test for the mean vector in large dimension and small samples. J. Statist. Plann. Inference 143 929-943. · Zbl 1428.62251
[15] Shen, Y. and Lin, Z. (2015). An adaptive test for the mean vector in large-\(p\)-small-\(n\) problems. Comput. Statist. Data Anal. 89 25-38. · Zbl 1468.62179
[16] Srivastava, M. S. (2007). Multivariate theory for analyzing high dimensional data. J. Japan Statist. Soc. 37 53-86. · Zbl 1140.62047
[17] Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. J. Multivariate Anal. 100 518-532. · Zbl 1154.62046
[18] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 99 386-402. · Zbl 1148.62042
[19] Srivastava, M. S. and Kubokawa, T. (2013). Tests for multivariate analysis of variance in high dimension under non-normality. J. Multivariate Anal. 115 204-216. · Zbl 1294.62127
[20] Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. J. Amer. Statist. Assoc. 110 1658-1669. · Zbl 1373.62280
[21] Xu, G., Lin, L., Wei, P. and Pan, W. (2016). An adaptive two-sample test for high-dimensional means. Biometrika 103 609-624. · Zbl 07072152
[22] Xue, K. and Yao, F. (2019). Supplement to “Distribution and correlation-free two-sample test of high-dimensional means.” https://doi.org/10.1214/19-AOS1848SUPP.
[23] Yagi, A. and Seo, T. (2014). A test for mean vector and simultaneous confidence intervals with three-step monotone missing data. Amer. J. Math. Management Sci. 33 161-175.
[24] Yamada, T. and Himeno, T. (2015). Testing homogeneity of mean vectors under heteroscedasticity in high-dimension. J. Multivariate Anal. 139 7-27. · Zbl 1320.62132
[25] Zhang, J. and Pan, M. (2016). A high-dimension two-sample test for the mean using cluster subspaces. Comput. Statist. Data Anal. 97 87-97. · Zbl 1468.62227
[26] Zhang, X. (2015). Testing high dimensional mean under sparsity. Preprint. Available at arXiv:1509.08444v2.
[27] Zhao, J. (2017). A new test for the mean vector in large dimension and small samples. Comm. Statist. Simulation Comput. 46 6115-6128. · Zbl 1462.62349
[28] Zhong, P.-S., Chen, S. X. and Xu, M. (2013). Tests alternative to higher criticism for high-dimensional means under sparsity and column-wise dependence. Ann. Statist. 41 2820-2851. · Zbl 1294.62128
[29] Zhu, Y. · Zbl 1416.62305
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.