zbMATH — the first resource for mathematics

An adaptable generalization of Hotelling’s \(T^2\) test in high dimension. (English) Zbl 1451.62083
There is an investigation on the two-sample testing problem for high dimensional means in connection with the Hotelling’s \(T^2\) approach. A test statistic based upon the regularized Hotelling’s \(T^2\), RTH, introduced by L. S. Chen et al. [J. Am. Stat. Assoc. 106, No. 496, 1345–1360 (2011; Zbl 1234.62082)] for the one-sample case, is proposed and extensively discussed. The authors provide a Bayesian framework to analyze the power of the RTH and construct a new composite test by combining the RTH statistics corresponding to a set of optimally chosen regularization parameters. The new composite testing procedure will be named “adaptable RTH”, shortly ARTH. The weak convergence of the considered stochastic process to a Gaussian limit is proved. The asymptotic behavior of the test by relaxing the assumption of Gaussian to sub-Gaussian is investigated. The regularized Hotelling’s \(T^2\), RTH, test is introduced and largely discussed in the second section of the article while the adaptable ARTH is presented in the third section. The calibration of type I error probability and extension to a general class of sub-Gaussian distributions are presented in the fourth and fifth section. Simulation results are shown in the sixth section and a practical application to a breast cancer data set in the seventh section. In the eighth section, we find a short discussion on obtained results. The ninth section is devoted to the proofs of main results and is followed by a short appendix containing some auxiliary results. One reports that additional simulation results and detailed proofs of the main results are contained in a supplementary material in doi:10.1214/19-AOS1869SUPP.

62J10 Analysis of variance and covariance (ANOVA)
62H15 Hypothesis testing in multivariate analysis
62H20 Measures of association (correlation, canonical correlation, etc.)
60B20 Random matrices (probabilistic aspects)
60G15 Gaussian processes
15B52 Random matrices (algebraic aspects)
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI Euclid
[1] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley Publications in Statistics. Wiley, New York. · Zbl 0651.62041
[2] Bai, Z., Chen, J. and Yao, J. (2010). On estimation of the population spectral distribution from a high-dimensional sample covariance matrix. Aust. N. Z. J. Stat. 52 423-437. · Zbl 1373.62245
[3] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311-329. · Zbl 0848.62030
[4] Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. 26 316-345. · Zbl 0937.60017
[5] Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1301.60002
[6] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014
[7] Bergamaschi, A., Kim, Y. H., Wang, P., Sørlie, T., Hernandez-Boussard, T., Lonning, P. E., Tibshirani, R., Børresen-Dale, A. L. and Pollack, J. R. (2006). Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer. Genes Chromosomes Cancer 45 1033-40.
[8] Biswas, M. and Ghosh, A. K. (2014). A nonparametric two-sample test applicable to high dimensional data. J. Multivariate Anal. 123 160-171. · Zbl 1278.62059
[9] Cai, T. T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349-372.
[10] Chakraborty, A. and Chaudhuri, P. (2017). Tests for high-dimensional data based on means, spatial signs and spatial ranks. Ann. Statist. 45 771-799. · Zbl 1368.62147
[11] Chang, J., Zhou, W. and Zhou, W. X. (2014). Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity—An alternative road to high dimensional tests. Preprint. Available at arXiv:1406.1939.
[12] Chatterjee, S. (2009). Fluctuations of eigenvalues and second order Poincaré inequalities. Probab. Theory Related Fields 143 1-40. · Zbl 1152.60024
[13] Chen, S. X., Li, J. and Zhong, P. (2014). Two-sample tests for high dimensional means with thresholding and data transformation. Preprint. Available at arXiv:1410.2848.
[14] Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808-835. · Zbl 1183.62095
[15] Chen, L. S., Paul, D., Prentice, R. L. and Wang, P. (2011). A regularized Hotelling’s \(T^2\) test for pathway analysis in proteomic studies. J. Amer. Statist. Assoc. 106 1345-1360. · Zbl 1234.62082
[16] Creighton, C. J. (2012). The molecular profile of luminal B breast cancer. Biologics 6 289-297.
[17] Dong, K., Pang, H., Tong, T. and Genton, M. G. (2016). Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data. J. Multivariate Anal. 143 127-142. · Zbl 1328.62351
[18] El Karoui, N. and Kösters (2011). Geometric sensitivity of random matrix results: Consequences for shrinkage estimators of covariance and related statistical methods. Preprint. Available at arXiv:1105.1404.
[19] Ellis, M. J., Gillette, M., Carr, S. A., Paulovich, A. G., Smith, R. D., Rodland, K. K., Townsend, R. R., Kinsinger, C., Mesri, M. et al. (2013). Connecting genomic alterations to cancer biology with proteomics: The NCI Clinical Proteomic Tumor Analysis Consortium. Cancer Discov. 3 1108-1112.
[20] Gregory, K. B., Carroll, R. J., Baladandayuthapani, V. and Lahiri, S. N. (2015). A two-sample test for equality of means in high dimension. J. Amer. Statist. Assoc. 110 837-849. · Zbl 1373.62274
[21] Gretton, A., Sriperumbudur, B., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M. and Fukumizu, K. (2012). Optimal kernel choice for large-scale two-sample tests. Adv. Neural Inf. Process. Syst. 25 1205-1213.
[22] Guo, B. and Chen, S. X. (2016). Tests for high dimensional generalized linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 1079-1102. · Zbl 1414.62328
[23] Lamy, P.-J., Fina, F., Bascoul-Mollevi, C., Laberenne, A.-C., Martin, P.-M., Ouafik, L. and Jacot, W. (2011). Quantification and clinical relevance of gene amplification at chromosome 17q12-q21 in human epidermal growth factor receptor 2-amplified breast cancers. Breast Cancer Res. 13 R15.
[24] Ledoit, O. and Péché, S. (2011). Eigenvectors of some large sample covariance matrix ensembles. Probab. Theory Related Fields 151 233-264. · Zbl 1229.60009
[25] Li, H., Aue, A., Paul, D., Peng, J. and Wang, P. (2020). Supplement to “An adaptable generalization of Hotelling’s \(T^2\) test in high dimension.” https://doi.org/10.1214/19-AOS1869SUPP.
[26] Liu, H., Aue, A. and Paul, D. (2015). On the Marcenko-Pastur law for linear time series. Ann. Statist. 43 675-712. · Zbl 1312.62080
[27] Lopes, M. E., Jacob, L. and Wainwright, M. J. (2011). A more powerful two-sample test in high dimensions using random projection. Adv. Neural Inf. Process. Syst. 1206-1214.
[28] Mertins, P., Mani, D. R., Ruggles, K. V., Gillette, M. A., Clauser, K. R., Wang, P., Wang, X., Qiao, J. W., Cao, S. et al. (2016). Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534 55-62.
[29] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley Series in Probability and Mathematical Statistics. Wiley, New York.
[30] Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature 490 61-70.
[31] Pan, G. M. and Zhou, W. (2011). Central limit theorem for Hotelling’s \(T^2\) statistic under large dimension. Ann. Appl. Probab. 21 1860-1910. · Zbl 1250.62030
[32] Paul, D. and Aue, A. (2014). Random matrix theory in statistics: A review. J. Statist. Plann. Inference 150 1-29. · Zbl 1287.62011
[33] Paulovich, A. G., Billheimer, D., Ham, A. J., Vega-Montoto, L., Rudnick, P. A., Tabb, D. L., Wang, P. et al. (2010). Interlaboratory study characterizing a yeast performance standard for benchmarking LC-MS platform performance. Mol. Cell. Proteomics 9 242-254.
[34] Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. J. Multivariate Anal. 100 518-532. · Zbl 1154.62046
[35] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 99 386-402. · Zbl 1148.62042
[36] Srivastava, R., Li, P. and Ruppert, D. (2016). RAPTT: An exact two-sample test in high dimensions using random projections. J. Comput. Graph. Statist. 25 954-970.
[37] Tran, B. and Bedard, P. L. (2011). Luminal-B breast cancer and novel therapeutic targets. Breast Cancer Res. 13 221.
[38] Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. J. Amer. Statist. Assoc. 110 1658-1669. · Zbl 1373.62280
[39] Xu, G. · Zbl 07072152
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.