zbMATH — the first resource for mathematics

TFisher: a powerful truncation and weighting procedure for combining \(p\)-values. (English) Zbl 1439.62259
Summary: The \(p\)-value combination approach is an important statistical strategy for testing global hypotheses with broad applications in signal detection, meta-analysis, data integration, etc. In this paper we extend the classic Fisher’s combination method to a unified family of statistics, called TFisher, which allows a general truncation-and-weighting scheme of input \(p\)-values. TFisher can significantly improve statistical power over the Fisher and related truncation-only methods for detecting both rare and dense “signals.” To address wide applications, analytical calculations for TFisher’s size and power are deduced under any two continuous distributions in the null and the alternative hypotheses. The corresponding omnibus test (oTFisher) and its size calculation are also provided for data-adaptive analysis. We study the asymptotic optimal parameters of truncation and weighting based on Bahadur efficiency (BE). A new asymptotic measure, called the asymptotic power efficiency (APE), is also proposed for better reflecting the statistics’ performance in real data analysis. Interestingly, under the Gaussian mixture model in the signal detection problem, both BE and APE indicate that the soft-thresholding scheme is the best, the truncation and weighting parameters should be equal. By simulations of various signal patterns, we systematically compare the power of statistics within TFisher family as well as some rare-signal-optimal tests. We illustrate the use of TFisher in an exome-sequencing analysis for detecting novel genes of amyotrophic lateral sclerosis. Relevant computation has been implemented into an R package TFisher published on the Comprehensive R Archive Network to cater for applications.
62R07 Statistical aspects of big data and data science
62G10 Nonparametric hypothesis testing
62P30 Applications of statistics in engineering and industry; control charts
60G35 Signal detection and filtering (aspects of stochastic processes)
94A12 Signal theory (characterization, reconstruction, filtering, etc.)
Full Text: DOI Euclid
[1] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584-653. · Zbl 1092.62005
[2] Abu-Dayyeh, W. A., Al-Momani, M. A. and Muttlak, H. A. (2003). Exact Bahadur slope for combining independent tests for normal and logistic distributions. Appl. Math. Comput. 135 345-360. · Zbl 1016.62049
[3] Andrés-Benito, P., Moreno, J., Aso, E., Povedano, M. and Ferrer, I. (2017). Amyotrophic lateral sclerosis, gene deregulation in the anterior horn of the spinal cord and frontal cortex area 8: Implications in frontotemporal lobar degeneration. Aging 9 823-851.
[4] Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533-2556. · Zbl 1231.62136
[5] Ayers, K. L., Mirshahi, U. L., Wardeh, A. H., Murray, M. F., Hao, K., Glicksberg, B. S., Li, S., Carey, D. J. and Chen, R. (2016). A loss of function variant in CASP7 protects against Alzheimer’s disease in homozygous APOE \(\varepsilon 4\) allele carriers. BMC Genomics 17 445.
[6] Azzalini, A. (1985). A class of distributions which includes the normal ones. Scand. J. Stat. 12 171-178. · Zbl 0581.62014
[7] Bahadur, R. R. (1960). Stochastic comparison of tests. Ann. Math. Stat. 31 276-295. · Zbl 0201.52203
[8] Barnett, I. J. and Lin, X. (2014). Analytical \(p\)-value calculation for the higher criticism test in finite-\(d\) problems. Biometrika 101 964-970. · Zbl 1306.62219
[9] Biernacka, J. M., Jenkins, G. D., Wang, L., Moyer, A. M. and Fridley, B. L. (2012). Use of the gamma method for self-contained gene-set analysis of SNP data. Eur. J. Hum. Genet. 20 565-571.
[10] Bonifati, V. (2006). Parkinson’s disease: The LRRK2-G2019S mutation: Opening a novel era in Parkinson’s disease genetics. Eur. J. Hum. Genet. 14 1061-1062.
[11] Bruce, A. G. and Gao, H.-Y. (1996). Understanding WaveShrink: Variance and bias estimation. Biometrika 83 727-745. · Zbl 0883.62038
[12] Cai, T. T. and Wu, Y. (2014). Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inform. Theory 60 2217-2232. · Zbl 1360.94108
[13] Carter, B. J., Anklesaria, P., Choi, S. and Engelhardt, J. F. (2009). Redox modifier genes and pathways in amyotrophic lateral sclerosis. Antioxid. Redox Signal. 11 1569-1586.
[14] Casella, G. and Berger, R. L. (2002). Statistical Inference, 2nd ed. Duxbury, Pacific Grove, CA. · Zbl 0699.62001
[15] Cevikbas, F., Wang, X., Akiyama, T., Kempkes, C., Savinko, T., Antal, A., Kukova, G., Buhl, T., Ikoma, A. et al. (2014). A sensory neuron-expressed IL-31 receptor mediates T helper cell-dependent itch: Involvement of TRPV1 and TRPA1. J. Allergy Clin. Immunol. 133 448-460.
[16] Chapman, D. L. and Papaioannou, V. E. (1998). Three neural tubes in mouse embryos with mutations in the T-box gene Tbx6. Nature 391 695-697.
[17] Chen, C.-W. and Yang, H.-C. (2017). OPATs: Omnibus \(P\)-value association tests. Brief. Bioinform. 20 1-14.
[18] Cox, L. E., Ferraiuolo, L., Goodall, E. F., Heath, P. R., Higginbottom, A., Mortiboys, H., Hollinger, H. C., Hartley, J. A., Brockington, A. et al. (2010). Mutations in CHMP2B in lower motor neuron predominant amyotrophic lateral sclerosis (ALS). PLoS ONE 5 e9872.
[19] Dai, H., Leeder, J. S. and Cui, Y. (2014). A modified generalized Fisher method for combining probabilities from dependent tests. Front. Genet. 5 32.
[20] Daniels, H. E. (1954). Saddlepoint approximations in statistics. Ann. Math. Stat. 25 631-650. · Zbl 0058.35404
[21] DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics. Springer, New York. · Zbl 1154.62001
[22] de Oliveira, G. P., Maximino, J. R., Maschietto, M., Zanoteli, E., Puga, R. D., Lima, L., Carraro, D. M. and Chadi, G. (2014). Early gene expression changes in skeletal muscle from SOD1G93A amyotrophic lateral sclerosis animal model. Cell. Mol. Neurobiol. 34 451-462.
[23] Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41 613-627. · Zbl 0820.62002
[24] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962-994. · Zbl 1092.62051
[25] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455. · Zbl 0815.62019
[26] Dudbridge, F. and Koeleman, B. P. C. (2003). Rank truncated product of P-values, with application to genomewide association scans. Genet. Epidemiol. 25 360-366.
[27] Duerr, R. H., Taylor, K. D., Brant, S. R., Rioux, J. D., Silverberg, M. S., Daly, M. J., Steinhart, A. H., Abraham, C., Regueiro, M. et al. (2006). A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Sci. Signal. 314 1461.
[28] Fanning, S., Xu, W., Beaurepaire, C., Suhan, J. P., Nantel, A. and Mitchell, A. P. (2012). Functional control of the Candida albicans cell wall by catalytic protein kinase A subunit Tpk1. Mol. Microbiol. 86 284-302.
[29] Fisher, R. A. (1932). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh. · JFM 58.1161.04
[30] Genz, A. (1992). Numerical computation of multivariate normal probabilities. J. Comput. Graph. Statist. 1 141-149.
[31] Good, I. J. (1955). Ont the wieghted combination of significance tests. J. Roy. Statist. Soc. Ser. B 17 264-265. · Zbl 0067.11802
[32] Guo, S., Li, Z.-Z., Gong, J., Xiang, M., Zhang, P., Zhao, G.-N., Li, M., Zheng, A., Zhu, X. et al. (2015). Oncostatin M confers neuroprotection against ischemic stroke. J. Neurosci. 35 12047-12062.
[33] Hoh, J., Wille, A. and Ott, J. (2001). Trimming, weighting, and grouping SNPs in human case-control association studies. Genome Res. 11 2115-2119.
[34] Ingster, Y. I. (2002). Adaptive detection of a signal of growing dimension. II. Math. Methods Statist. 11 37-68. · Zbl 1005.62052
[35] Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476-1526. · Zbl 1329.62314
[36] Kuo, C.-L. and Zaykin, D. V. (2011). Novel rank-based approaches for discovery and replication in genome-wide association studies. Genetics 189 329-340.
[37] Lee, S., Emond, M. J., Bamshad, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., Christiani, D. C., Wurfel, M. M. and Lin, X. (2012). Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91 224-237.
[38] Li, J. and Tseng, G. C. (2011). An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. Ann. Appl. Stat. 5 994-1019. · Zbl 05961700
[39] Lin, X., Lee, S., Wu, M. C., Wang, C., Chen, H., Li, Z. and Lin, X. (2016). Test for rare variants by environment interactions in sequencing association studies. Biometrics 72 156-164. · Zbl 1393.62124
[40] Littell, R. C. and Folks, J. L. (1971). Asymptotic optimality of Fisher’s method of combining independent tests. J. Amer. Statist. Assoc. 66 802-806. · Zbl 0229.62011
[41] Littell, R. C. and Folks, J. L. (1973). Asymptotic optimality of Fisher’s method of combining independent tests. II. J. Amer. Statist. Assoc. 68 193-194. · Zbl 0259.62022
[42] Lugannani, R. and Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Adv. in Appl. Probab. 12 475-490. · Zbl 0425.60042
[43] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Monographs on Statistics and Applied Probability. CRC Press, London. · Zbl 0744.62098
[44] Morahan, J. M., Yu, B., Trent, R. J. and Pamphlett, R. (2009). A genome-wide analysis of brain DNA methylation identifies new candidate genes for sporadic amyotrophic lateral sclerosis. Amyotroph. Lateral Scler. 10 418-429.
[45] Nadarajah, S. (2005). A generalized normal distribution. J. Appl. Stat. 32 685-694. · Zbl 1121.62447
[46] Nikitin, Y. (1995). Asymptotic Efficiency of Nonparametric Tests. Cambridge Univ. Press, Cambridge. · Zbl 0879.62045
[47] Schaid, D. J., Rowland, C. M., Tines, D. E., Jacobson, R. M. and Poland, G. A. (2002). Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70 425-434.
[48] Smith, B. N., Ticozzi, N., Fallini, C., Gkazi, A. S., Topp, S., Kenna, K. P., Scotter, E. L., Kost, J., Keagle, P. et al. (2014). Exome-wide rare variant analysis identifies TUBA4A mutations associated with familial ALS. Neuron 84 324-331.
[49] Song, C. and Tseng, G. C. (2014). Hypothesis setting and order statistic for robust genomic meta-analysis. Ann. Appl. Stat. 8 777-800. · Zbl 06333776
[50] Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A. and Williams, R. M. (1949). The American Soldier: Adjustment During Army Life I. Princeton Univ. Press, Princeton, NJ.
[51] Su, Y.-C., Gauderman, W. J., Berhane, K. and Lewinger, J. P. (2016). Adaptive set-based methods for association testing. Genet. Epidemiol. 40 113-122.
[52] Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A. et al. (2014). STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43 D447-D452.
[53] Pandya, S., Mao, L. L., Zhou, E. W., Bowser, R., Zhu, Z., Zhu, Y. and Wang, X. (2012). Neuroprotection for amyotrophic lateral sclerosis: Role of stem cells, growth factors, and gene therapy. Cent. Nerv. Syst. Agents. Med. Chem. 12 15-27.
[54] Varanasi, M. K. and Aazhang, B. (1989). Parametric generalized Gaussian density estimation. J. Acoust. Soc. Am. 86 1404-1415.
[55] Whitlock, M. C. (2005). Combining probability from independent tests: The weighted Z-method is superior to Fisher’s approach. J. Evol. Biol. 18 1368-1373.
[56] Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89 82-93.
[57] Wu, Z., Sun, Y., He, S., Cho, J., Zhao, H. and Jin, J. (2014). Detection boundary and higher criticism approach for rare and weak genetic effects. Ann. Appl. Stat. 8 824-851. · Zbl 06333778
[58] Yu, K., Li, Q., Bergen, A. W., Pfeiffer, R. M., Rosenberg, P. S., Caporaso, N., Kraft, P. and Chatterjee, N. (2009). Pathway analysis by adaptive combination of \(P\)-values. Genet. Epidemiol. 33 700-709.
[59] Zaykin, D. V., Zhivotovsky, L. A., Westfall, P. H. and Weir, B. S. (2002). Truncated product method for combining P-values. Genet. Epidemiol. 22 170-185.
[60] Zaykin, D. V., Zhivotovsky, L. A., Czika, W., Shao, S. and Wolfinger, R. D. (2007). Combining p-values in large-scale genomics experiments. Pharm. Stat. 6 217-226.
[61] Zhang, J. and Huang, E. J. (2006). Dynamic expression of neurotrophic factor receptors in postnatal spinal motoneurons and in mouse model of ALS. J. Neurobiol. 66 882-895.
[62] Zhang, H., Tong, T., Landers, J. E. and Wu, Z. (2020). Supplement to “TFisher: A truncation and weighting procedure for combining \(p\)-values.” https://doi.org/10.1214/19-AOAS1302SUPP.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.