×

Powerful test based on conditional effects for genome-wide screening. (English) Zbl 1393.62080

Summary: This paper considers testing procedures for screening large genome-wide data, where we examine hundreds of thousands of genetic variants, for example, single nucleotide polymorphisms (SNP), on a quantitative phenotype. We screen the whole genome by SNP sets and propose a new test that is based on conditional effects from multiple SNPs. The test statistic is developed for weak genetic effects and incorporates correlations among genetic variables, which may be very high due to linkage disequilibrium. The limiting null distribution of the test statistic and the power of the test are derived. Under appropriate conditions, the test is shown to be more powerful than the minimum \(p\)-value method, which is based on marginal SNP effects and is the most commonly used method in genome-wide screening. The proposed test is also compared with other existing methods, including the Higher Criticism (HC) test and the sequence kernel association test (SKAT), through simulations and analysis of a real genome data set. For typical genome-wide data, where effects of individual SNPs are weak and correlations among SNPs are high, the proposed test is more advantageous and clearly outperforms the other methods in the literature.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62H15 Hypothesis testing in multivariate analysis

Software:

covTest; Eigenstrat

References:

[1] Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist.39 2533-2556. · Zbl 1231.62136
[2] Ballard, D. H., Cho, J. and Zhao, H. (2010). Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet. Epidemiol.34 201-212.
[3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014
[4] Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol.76 349-372. · Zbl 07555454
[5] Chen, B. E., Sakoda, L. C., Hsing, A. W. and Rosenberg, P. S. (2006). Resampling-based multiple hypothesis testing procedures for genetic case-control association studies. Genet. Epidemiol.30 495-507.
[6] Cui, J., Stahl, E. A., Saevarsdottir, S., Miceli, C., Diogo, D., Trynka, G., Raj, T., Mirkov, M. U., Canhao, H., Ikari, K. et al. (2013). Genome-wide association study and gene expression analysis identifies CD84 as a predictor of response to etanercept therapy in rheumatoid arthritis. PLoS Genet.9 e1003394.
[7] Dolcino, M., Ottria, A., Barbieri, A., Patuzzo, G., Tinazzi, E., Argentino, G., Beri, R., Lunardi, C. and Puccetti, A. (2015). Gene expression profiling in peripheral blood cells and synovial membranes of patients with psoriatic arthritis. PLoS ONE 10 e0128262.
[8] Donoho, D. and Jin, J. (2015). Higher criticism for large-scale inference, especially for rare and weak effects. Statist. Sci.30 1-25. · Zbl 1332.62019
[9] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol.70 849-911. · Zbl 1411.62187
[10] Goeman, J. J., Van De Geer, S. A. and Van Houwelingen, H. C. (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B. Stat. Methodol.68 477-493. · Zbl 1110.62002 · doi:10.1111/j.1467-9868.2006.00551.x
[11] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist.38 1686-1732. · Zbl 1189.62080
[12] Ingster, Y. I., Tsybakov, A. B., Verzelen, N. et al. (2010). Detection boundary in sparse regression. Electron. J. Stat.4 1476-1526. · Zbl 1329.62314
[13] Lee, S., Abecasis, G. R., Boehnke, M. and Lin, X. (2014). Rare-variant association analysis: Study designs and statistical tests. Am. J. Hum. Genet.95 5-23.
[14] Leisch, F., Weingessel, A. and Hornik, K. (1998). On the generation of correlated artificial binary data.
[15] Li, J. and Zhong, P. (2017). A rate optimal procedure for recovering sparse differences between high-dimensional means under dependence. Ann. Statist.45 557-590. · Zbl 1368.62152
[16] Liu, Y. and Xie, J. (2018). Supplement to “Powerful test based on conditional effects for genome-wide screening.” DOI:10.1214/17-AOAS1103SUPP. · Zbl 1393.62080
[17] Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the lasso. Ann. Statist.42 413-468. · Zbl 1305.62254
[18] Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet.38 904-909.
[19] Sham, P. C. and Purcell, S. M. (2014). Statistical power and significance testing in large-scale genetic studies. Nat. Rev. Genet.15 335-346.
[20] Taylor, J., Loftus, J. and Tibshirani, R. (2013). Tests in adaptive regression via the Kac-Rice formula. Preprint. Available at arXiv:1308.3020. · Zbl 1337.62304
[21] Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J. and Lin, X. (2010). Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet.86 929-942.
[22] Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet.89 82-93.
[23] Wu, Z., Sun, Y., He, S., Cho, J., Zhao, H. and Jin, J. (2014). Detection boundary and higher criticism approach for rare and weak genetic effects. Ann. Appl. Stat.8 824-851. · Zbl 1454.62420
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.