×

Projected tests for high-dimensional covariance matrices. (English) Zbl 1437.62203

Summary: The classic likelihood ratio test for testing the equality of two covariance matrices breakdowns due to the singularity of the sample covariance matrices when the data dimension is larger than the sample size. In this paper, we present a conceptually simple method using random matrices to project the data onto a one-dimensional random subspace so that conventional methods can be applied. Both one-sample and two-sample tests for high-dimensional covariance matrices are considered. A transformation using the precision matrix is used to help maintain the information on the off-diagonal elements of the covariance matrices. Multiple projections are used to improve the performance of the proposed tests. An extremal type theorem is established and used to estimate the significance level. Simulations and an application to the Acute Lymphoblastic Leukemia (ALL) data are given to illustrate our method.

MSC:

62H15 Hypothesis testing in multivariate analysis
62G32 Statistics of extreme values; tail inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bai, Z. D., Convergence rate of expected spectral distributions of large random matrices. II. Sample covariance matrices, Ann. Probab., 21, 2, 649-672 (1993), URL https://www.jstor.org/stable/2244670 · Zbl 0779.60025
[2] Bai, Z.; Jiang, D.; Yao, J.-F.; Zheng, S., Corrections to LRT on large-dimensional covariance matrix by RMT, Ann. Statist., 37, 6B, 3822-3840 (2009) · Zbl 1360.62286
[3] Bai, Z.; Saranadasa, H., Effect of high dimension: by an example of a two sample problem, Statist. Sinica, 6, 2, 311-329 (1996) · Zbl 0848.62030
[4] Bai, Z. D.; Yin, Y. Q., Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix, Ann. Probab., 21, 3, 1275-1294 (1993), URL https://www.jstor.org/stable/2244575 · Zbl 0779.60026
[5] Benjamini, Y.; Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. R. Stat. Soc. Ser. B Stat. Methodol., 57, 1, 289-300 (1995), URL https://www.jstor.org/stable/2346101 · Zbl 0809.62014
[6] Cai, T. T.; Liu, W.; Xia, Y., Two-sample test of high dimensional means under dependence, J. R. Stat. Soc. Ser. B. Stat. Methodol., 76, 2, 349-372 (2014) · Zbl 07555454
[7] Cai, T. T.; Liu, W.; Zhou, H. H., Estimating sparse precision matrix: optimal rates of convergence and adaptive estimation, Ann. Statist., 44, 2, 455-488 (2016) · Zbl 1341.62115
[8] Cai, T. T.; Ma, Z., Optimal hypothesis testing for high dimensional covariance matrices, Bernoulli, 19, 5B, 2359-2388 (2013) · Zbl 1281.62140
[9] Chang, J.; Zhou, W.; Zhou, W.-X.; Wang, L., Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering, Biometrics, 73, 1, 31-41 (2017) · Zbl 1366.62206
[10] Chen, S. X.; Qin, Y.-L., A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Statist., 38, 2, 808-835 (2010) · Zbl 1183.62095
[11] Chen, S. X.; Zhang, L.-X.; Zhong, P.-S., Tests for high-dimensional covariance matrices, J. Amer. Statist. Assoc., 105, 490, 810-819 (2010) · Zbl 1321.62086
[12] Chiaretti, S.; Li, X.; Gentleman, R.; Vitale, A.; Vignetti, M.; Mandelli, F.; Ritz, J.; Foa, R., Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival, Blood, 103, 7, 2771-2778 (2004)
[13] Coles, S., (An Introduction to Statistical Modeling of Extreme Values. An Introduction to Statistical Modeling of Extreme Values, Springer Series in Statistics (2001), Springer-Verlag London, Ltd.: Springer-Verlag London, Ltd. London), xiv+208 · Zbl 0980.62043
[14] Embrechts, P.; Mikosch, T.; Klüppelberg, C., Modelling extremal events: For insurance and finance (1997), Springer-Verlag: Springer-Verlag London, UK, UK · Zbl 0873.62116
[15] Fang, K. T.; Kotz, S.; Ng, K. W., (Symmetric Multivariate and Related Distributions. Symmetric Multivariate and Related Distributions, Monographs on Statistics and Applied Probability, vol. 36 (1990), Chapman and Hall, Ltd.: Chapman and Hall, Ltd. London), x+220 · Zbl 0699.62048
[16] Ferro, C. A.T.; Segers, J., Inference for clusters of extreme values, J. R. Stat. Soc. Ser. B Stat. Methodol., 65, 2, 545-556 (2003) · Zbl 1065.62091
[17] (Gentleman, R.; Carey, V. J.; Huber, W.; Irizarry, R. A.; Dudoit, S., Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Statistics for Biology and Health (2005), Springer: Springer New York), xx+473, ISBN: 978-0387-25146-2; 0-387-25146-4 · Zbl 1142.62100
[18] Ishii, A.; Yata, K.; Aoshima, M., Asymptotic properties of the first principal component and equality tests of covariance matrices in high-dimension, low-sample-size context, J. Statist. Plann. Inference, 170, 186-199 (2016) · Zbl 1381.62146
[19] Jiang, D.; Jiang, T.; Yang, F., Likelihood ratio tests for covariance matrices of high-dimensional normal distributions, J. Statist. Plann. Inference, 142, 8, 2241-2256 (2012) · Zbl 1244.62082
[20] Jiang, T.; Yang, F., Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions, Ann. Statist., 41, 4, 2029-2074 (2013) · Zbl 1277.62149
[21] Johnson, W. B.; Lindenstrauss, J., Extensions of Lipschitz mappings into a Hilbert space, (Conference in Modern Analysis and Probability (New Haven, Conn., 1982). Conference in Modern Analysis and Probability (New Haven, Conn., 1982), Contemp. Math., vol. 26 (1984), Amer. Math. Soc.: Amer. Math. Soc. Providence, RI), 189-206 · Zbl 0539.46017
[22] Li, J.; Chen, S. X., Two sample tests for high-dimensional covariance matrices, Ann. Statist., 40, 2, 908-940 (2012) · Zbl 1274.62383
[23] Li, P.; Hastie, T. J.; Church, K. W., Very sparse random projections, (Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06 (2006), ACM: ACM New York, NY, USA), 287-296, URL http://doi.acm.org/10.1145/1150402.1150436
[24] Li, W.; Qin, Y., Hypothesis testing for high-dimensional covariance matrices, J. Multivariate Anal., 128, 108-119 (2014), URL http://www.sciencedirect.com/science/article/pii/S0047259X14000694 · Zbl 1352.62086
[25] Li, S.; Xie, Y.; Song, L., Data-driven threshold machine: Scan statistics, change-point detection, and extreme bandits (2016), ArXiv e-prints
[26] Li, D.; Xue, L., Joint limiting laws for high-dimensional independence tests (2015), ArXiv e-prints
[27] Liu, Z.; Liu, B.; Zheng, S.; Shi, N.-Z., Simultaneous testing of mean vector and covariance matrix for high-dimensional data, J. Statist. Plann. Inference, 188, 82-93 (2017) · Zbl 1391.62098
[28] Lopes, M.; Jacob, L.; Wainwright, M. J., A more powerful two-sample test in high dimensions using random projection (2012), ArXiv e-prints
[29] Schott, J. R., A test for the equality of covariance matrices when the dimension is large relative to the sample sizes, Comput. Statist. Data Anal., 51, 12, 6535-6542 (2007) · Zbl 1445.62121
[30] Srivastava, R.; Li, P.; Ruppert, D., RAPTT: an exact two-sample test in high dimensions using random projections, J. Comput. Graph. Statist., 25, 3, 954-970 (2016)
[31] Srivastava, M. S.; Yanagihara, H., Testing the equality of several covariance matrices with fewer observations than the dimension, J. Multivariate Anal., 101, 6, 1319-1329 (2010) · Zbl 1186.62078
[32] Süveges, M., Likelihood estimation of the extremal index, Extremes, 10, 1-2, 41-55 (2007) · Zbl 1150.62368
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.