×

An adaptive decorrelation procedure for signal detection. (English) Zbl 1510.62205

Summary: In global testing, where a large number of pointwise test statistics are aggregated to simultaneously test for a collection of null hypotheses, the handling of dependence is a crucial issue. In various fields, more particularly in genetic epidemiology and functional data analysis, many testing methods for detecting an association signal between a response and explanatory variables have been proposed. Some aggregation procedures ignore dependence across pointwise test statistics whereas others introduce a model for decorrelation, with unclear conclusions on their relative performance. Indeed, the benefit that can be expected from decorrelation highly depends on the interplay between the structure of dependence across pointwise test statistics and the pattern of the association signal. Within a large class of test statistics covering a continuum of decorrelation approaches, an optimal procedure is introduced. This procedure is based on the maximization of an ad-hoc cumulant generating function-based distance between the null and nonnull distributions of a global test statistic, in order to adapt the aggregation of the pointwise statistics to the pattern of the association signal. A comparative study including simulations and applications to genetic association studies demonstrates that the ability of this test to detect a signal is more robust to the dependence structure than existing methods.

MSC:

62G10 Nonparametric hypothesis testing
62G20 Asymptotic properties of nonparametric inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
62R10 Functional data analysis

Software:

SKAT; GenOrd; Matrix
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ahdesmäki, M.; Strimmer, K., Feature selection in omics prediction problems using CAT scores and False Nondiscovery Rate control, Ann. Appl. Stat., 4, 1, 503-519 (2010) · Zbl 1189.62102
[2] Arias-Castro, E.; Candès, E. J.; Plan, Y., Global testing under sparse alternatives: ANOVA, multiple comparisons and the Higher Criticism, Ann. Statist., 2533-2556 (2011) · Zbl 1231.62136
[3] Barbiero, A.; Ferrari, P. A., GenOrd: Simulation of discrete random variables with given correlation matrix and marginal distributions (2015), R package version 1.4.0, URL https://CRAN.R-project.org/package=GenOrd
[4] Barnett, I.; Mukherjee, R.; Lin, X., The generalized higher criticism for testing SNP-set effects in genetic association studies, J. Amer. Statist. Assoc., 112, 517, 64-76 (2017)
[5] Bates, D.; Maechler, M., Matrix: Sparse and dense matrix classes and methods (2018), R package version 1.2-14, URL https://CRAN.R-project.org/package=Matrix
[6] Bickel, P. J.; Levina, E., Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations, Bernoulli, 10, 6, 989-1010 (2004) · Zbl 1064.62073
[7] Broeckx, B. J.; Derrien, T.; Mottier, S.; Wucher, V.; Cadieu, E.; Hédan, B.; Le Béguec, C.; Botherel, N.; Lindblad-Toh, K.; Saunders, J. H., An exome sequencing based approach for genome-wide association studies in the dog, Sci. Rep., 7, 15680 (2017)
[8] Buzkova, P.; Lumley, T.; Rice, K., Permutation and parametric bootstrap tests for gene-gene and gene-environment interactions, Ann. Hum. Genet., 75, 1, 36-45 (2011)
[9] Cai, T.; Liu, W.; Xia, Y., Two-sample test of high dimensional means under dependence, J. R. Stat. Soc. Ser. B Stat. Methodol., 76, 2, 349-372 (2014) · Zbl 07555454
[10] Conneely, K. N.; Boehnke, M., So many correlated tests, so little time! rapid adjustment of P values for multiple correlated tests, Am. J. Hum. Genet., 81, 6, 1158-1168 (2007)
[11] de Lange, K. M.; Moutsianas, L.; Lee, J. C.; Lamb, C. A.; Luo, Y.; Kennedy, N. A.; Jostins, L.; Rice, D. L.; Gutierrez-Achury, J.; Ji, S.-G., Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease, Nat. Genet., 49, 2, 256 (2017)
[12] Derkach, A.; Lawless, J. F.; Sun, L., Pooled association tests for rare genetic variants: a review and some new results, Statist. Sci., 302-321 (2014) · Zbl 1332.62410
[13] Donoho, D.; Jin, J., Higher Criticism for detecting sparse heterogeneous mixtures, Ann. Statist., 962-994 (2004) · Zbl 1092.62051
[14] Donoho, D.; Jin, J., Higher Criticism Thresholding: optimal feature selection when useful features are rare and weak, Proc. Natl. Acad. Sci., 105, 39, 14790-14795 (2008) · Zbl 1357.62212
[15] Epstein, M. P.; Duncan, R.; Jiang, Y.; Conneely, K. N.; Allen, A. S.; Satten, G. A., A permutation procedure to correct for confounders in case-control studies, including tests of rare variation, Am. J. Hum. Genet., 91, 2, 215-223 (2012)
[16] Hall, P.; Jin, J., Properties of Higher Criticism under strong dependence, Ann. Statist., 36, 1, 381-402 (2008) · Zbl 1139.62049
[17] Hall, P.; Jin, J., Innovated Higher Criticism for detecting sparse signals in correlated noise, Ann. Statist., 38, 3, 1686-1732 (2010) · Zbl 1189.62080
[18] Ingster, Y. I., Some problems of hypothesis testing leading to infinitely divisible distributions, Math. Methods Statist., 6, 1, 47-69 (1997) · Zbl 0878.62005
[19] Lee, S.; with contributions from Larisa Miropolsky, J.; Wu, M., SKAT: SNP-set (sequence) kernel association test (2017), R package version 1.3.2.1, URL https://CRAN.R-project.org/package=SKAT
[20] Liu, J. Z.; Mcrae, A. F.; Nyholt, D. R.; Medland, S. E.; Wray, N. R.; Brown, K. M.; Hayward, N. K.; Montgomery, G. W.; Visscher, P. M.; Martin, N. G., A versatile gene-based test for genome-wide association studies, Am. J. Hum. Genet., 87, 1, 139-145 (2010)
[21] McCullagh, P.; Nelder, J., Generalized Linear Models, Chapman and Hall/CRC Monographs on Statistics and Applied Probability Series (1989), Chapman & Hall · Zbl 0744.62098
[22] Ramsay, J.; Hooker, G.; Graves, S., Functional Data Analysis with R and MATLAB, Use R! (2009), Springer New York, URL https://books.google.fr/books?id=fNKHa8eV7WYC · Zbl 1179.62006
[23] Shen, Q.; Faraway, J., An F test for linear models with functional responses, Statist. Sinica, 1239-1257 (2004) · Zbl 1060.62075
[24] Sheu, C.-F.; Perthame, É.; Lee, Y.-S.; Causeur, D., Accounting for time dependence in large-scale multiple testing of event-related potential data, Ann. Appl. Stat., 10, 1, 219-245 (2016) · Zbl 1454.62230
[25] Vukcevic, D.; Hechter, E.; Spencer, C.; Donnelly, P., Disease model distortion in association studies, Genet. Epidemiol., 35, 4, 278-290 (2011)
[26] Wellcome Trust Case Control Consortium, D., Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls, Nature, 447, 7145, 661 (2007)
[27] Westfall, P.; Young, S., Resampling-Based Multiple Testing (1993), Wiley: Wiley New York
[28] Wu, M. C.; Lee, S.; Cai, T.; Li, Y.; Boehnke, M.; Lin, X., Rare-variant association testing for sequencing data with the sequence kernel association test, Am. J. Hum. Genet., 89, 1, 82-93 (2011)
[29] Wu, Z.; Sun, Y.; He, S.; Cho, J.; Zhao, H.; Jin, J., Detection boundary and Higher Criticism approach for rare and weak genetic effects, Ann. Appl. Stat., 8, 2, 824-851 (2014) · Zbl 1454.62420
[30] Zhang, J.-T., Analysis of Variance for Functional Data (2013), CRC Press
[31] Zhang, J.-T.; Liang, X., One-way anova for functional data via globalizing the pointwise F-test, Scand. J. Stat., 41, 1, 51-71 (2014) · Zbl 1349.62331
[32] Zhao, S. D.; Cai, T. T.; Cappola, T. P.; Margulies, K. B.; Li, H., Sparse simultaneous signal detection for identifying genetically controlled disease genes, J. Amer. Statist. Assoc., 112, 519, 1032-1046 (2017)
[33] Zhong, P.-S.; Chen, S. X.; Xu, M., Tests alternative to Higher Criticism for high-dimensional means under sparsity and column-wise dependence, Ann. Statist., 41, 6, 2820-2851 (2013) · Zbl 1294.62128
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.