×

Robust methods to detect disease-genotype association in genetic association studies: calculate \(p\)-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations. (English) Zbl 1302.92074

Summary: In genetic association studies, detecting disease-genotype association is a primary goal. We study seven robust test statistics for such association when the underlying genetic model is unknown, for data on disease status (case or control) and genotype (three genotypes of a biallelic genetic marker). In such studies, \(p\)-values have predominantly been calculated by asymptotic approximations or by simulated permutations. We consider an exact method, conditional enumeration. When the number of simulated permutations tends to infinity, the permutation \(p\)-value approaches the conditional enumeration \(p\)-value, but calculating the latter is much more efficient than performing simulated permutations. We have studied case-control sample sizes with 500–5000 cases and 500–15,000 controls, and significance levels from \(5\times 10^{-8}\) to 0.05, thus our results are applicable to genetic association studies with only a few genetic markers under study, intermediate follow-up studies, and genome-wide association studies. Our main findings are: (i) If all monotone genetic models are of interest, the best performance in the situations under study is achieved for the robust test statistics based on the maximum over a range of Cochran-Armitage trend tests with different scores and for the constrained likelihood ratio test. (ii) For significance levels below 0.05, for the test statistics under study, asymptotic approximations may give a test size up to 20 times the nominal level, and should therefore be used with caution. (iii) Calculating \(p\)-values based on exact conditional enumeration is a powerful, valid and computationally feasible approach, and we advocate its use in genetic association studies.

MSC:

92D10 Genetics and epigenetics
92B15 General biostatistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Armitage, P. (1955): “Tests for linear trends in proportions and frequencies,” Biometrics, 11, 375-386.;
[2] Bakke, Ø. and M. Langaas (2012): “The number of 2×c tables with given margins,” Preprint in Statistics 11/2012, Department of Mathematical Sciences, Norwegian University of Science and Technology.;
[3] Camp, N. J. (1997): “Genomewide transmission/disequilibrium testing - consideration of the genotypic relative risks at disease loci,” Am. J. Hum. Genet., 61, 1424-1430.;
[4] Casella, G. and R. L. Berger (2001): Statistical inference, 2nd edition. Duxbury: Pacific Grove, CA.; · Zbl 0699.62001
[5] Cochran, W. G. (1954): “Some methods for strengthening the common c2 tests,” Biometrics, 10, 417-451.; · Zbl 0059.12803
[6] Devlin, B. and K. Roeder (2004): “Genomic control for association studies,” Biometrics, 55, 997-1004.; · Zbl 1059.62640
[7] Freidlin, B., G. Zheng, Z. Li and J. L. Gastwirth (2002): “Trend tests for case-control studies of genetic markers: power, sample size and robustness,” Hum. Hered., 53, 146-152.;
[8] Gastwirth, J. L. (1985): “The use of maximin efficiency robust tests in combining contingency tables and survival analysis,” J. Am. Stat. Assoc., 80, 380-384.; · Zbl 0573.62042
[9] Joo, J., M. Kwak, K. Ahn and G. Zheng (2009): “A robust genome-wide scan statistic of the Wellcome Trust Case-Control Consortium,” Biometrics, 65, 1115-1122.; · Zbl 1180.62171
[10] Langaas, M. and Ø. Bakke (2013): “Increasing power with the unconditional maximization enumeration test in small samples - a detailed study of the MAX3 test statistic,” Preprint in Statistics 1/2013, Department of Mathematical Sciences, Norwegian University of Science and Technology.;
[11] Lehmann, E. L. (1993): “The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two?” J. Am. Stat. Assoc., 88, 1242-1249.; · Zbl 0805.62023
[12] Lydersen, S., M. W. Fagerland and P. Laake (2009): “Recommended tests for association in 2×2 tables,” Stat. Med., 28, 1159-75.;
[13] Mehrotra, D. V., D. S. F. Chan and R. L. Berger (2003): “A cautionary note on exact unconditional inference for a difference between two independent bionomial proportions,” Biometrics, 59, 441-450.; · Zbl 1210.62012
[14] Mehta, C. R. and J. F. Hilton (1993): “Exact power of conditional and unconditional tests: going beyond the 2×2 contingency table,” Am. Stat., 47, 91-98.;
[15] Mehta, C. R. and N. R. Patel (1983): “A network algorithm for performing Fisher’s exact test in r×c contingency tables,” J. Am. Stat. Assoc., 78, 427-434.; · Zbl 0545.62039
[16] Mehta, C. R. and N. R. Patel (1995): “Exact logistic regression: theory and examples.” Stat. Med., 14, 2143-2160.;
[17] Moldovan, M. and M. Langaas (2013): “Exact conditional p-values from arbitrary ranking of the sample space: an application to genome-wide association studies,” arXiv, 1307.7537.;
[18] Morris, N. and R. Elston (2011): “A note on comparing power of test statistics at low significance levels,” Am. Stat., 65, 164-166.; · Zbl 07671663
[19] Phipson, B. and G. K. Smyth (2010): “Permutation p-values should never be zero: Calculating exact p-values when permutations are randomly drawn,” Stat. Appl. Genet. Mol. Biol., 9, 39.; · Zbl 1304.92098
[20] Pirinen, M., P. Donnelly and C. C. A. Spencer (2012): “Including known covariates can reduce power to detect genetic effects in case-control studies,” Nat. Genet., 44, 848-853.;
[21] Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick and D. Reich (2006): “Principal components analysis corrects for stratification in genome-wide association studies,” Nat. Genet., 38, 904-909.;
[22] Robinson, L. D. and N. P. Jewell (1991): “Some surprising results about covariate adjustment in logistic regression models,” Int. Stat. Rev., 58, 227-240.; · Zbl 0742.62067
[23] Runde, M. (2013): Statistical metods for detecting genotype-phenotype association in the presence of environmental covariates, Master’s thesis, Norwegian University of Science and Technology.;
[24] Sasieni, P. D. (1997): “From genotypes to genes: doubling the sample size,” Biometrics, 53, 1253-1261.; · Zbl 0931.62099
[25] Sladek, R., G. Rocheleau, J. Rung, C. Dina, L. Shen, D. Serre, P. Boutin, D. Vincent, A. Belisle, S. Hadjadj, B. Balkau, B. Heude, G. Charpentier, T. J. Hudson, A. Montpetit, A. V. Pshezhetsky, M. Prentki, B. I. Posner, D. J. Balding, D. Meyre, C. Polychronakos and P. Froguel (2007): “A genome-wide association study identifies novel risk loci for type 2 diabetes,” Nature, 445, 881-885.;
[26] Slager, S. L. and D. J. Schaid (2001): “Case-control studies of genetic markers: Power and sample size approximations for Armitage’s test for trend,” Hum. Hered., 52, 149-153.;
[27] So, H.-C. and P. C. Sham (2011): “Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates,” Behav. Genet., 41, 768-775.;
[28] Tarone, R. E. and J. J. Gart (1980): “On the robustness of combined tests for trend in proportions,” JASA, 75, 110-116.; · Zbl 0424.62035
[29] The Wellcome Trust Case Control Consortium (2007): “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661-678.;
[30] Tian, J. and C. Xu (2013): MaXact: Exact max-type Cochran-Armitage trend test (CATT). R package version 0.2.1. http://CRAN.R-project.org/package=MaXact.;
[31] Wang, K. and V. C. Sheffield (2005): “A constrained-likelihood approach to marker-trait association studies,” Am. J. Hum. Genet., 77, 768-780.;
[32] Westfall, P. and S. Young (1993): “Resampling-based multiple testing: examples and methods for p-value adjustment,” Wiley series in probability and mathematical statistics (Applied probability and statistics).; · Zbl 0850.62368
[33] Wise, M. E. (1963): “Multinomial probabilities and the c2 and X2 distributions,” Biometrika, 50, 145-154.;
[34] Zang, Y., W. K. Fung and G. Zheng (2010): “Simple algorithms to calculate asymptotic null distributions of robust tests in case-control genetic association studies in R,” J. Stat. Softw., 33, 1-24.;
[35] Zheng, G., B. Freidlin and J. L. Gastwirth (2006): “Comparison of robust tests for genetic association using case-control studies,” In: Rojo, J. ed., Optimality: The Second Erich L. Lehmann Symposium, Beachwood, OH: Institute of Mathematical Statistics, Lecture Notes - Monograph Series, volume 49, 253-265.; · Zbl 1268.62145
[36] Zheng, G., B. Freidlin, Z. Li and J. L. Gastwirth (2003): “Choice of scores in trend tests for case-control studies of candidate-gene associations,” Biometrical J., 45, 335-348.; · Zbl 1441.62553
[37] Zheng, G., J. Joo and Y. Yang (2009): “Pearson’s test, trend test, and MAX are all trend tests with different types of scores,” Ann. Hum. Genet., 73, 133-140.;
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.