×

A semiparametric mixture method for local false discovery rate estimation from multiple studies. (English) Zbl 1470.62158

Summary: Antineutrophil cytoplasmic antibody associated vasculitis (AAV) is extremely heterogeneous in clinical presentation and involves multiple organ systems. While the clinical presentation of AAV is diverse, we hypothesized that all AAV share common pathways and tested the hypothesis based on three different microarray studies of peripheral leukocytes, sinus and orbital inflammation disease. For the hypothesis testing we developed a two-component semiparametric mixture model to estimate the local false discovery rates from the \(p\)-values of three studies. The two pillars of the proposed approach are Efron’s empirical null principle and log-concave density estimation for the alternative distribution. Our method outperforms other existing methods, in particular when the proportion of null is not that high. It is robust against the misspecification of alternative distribution. A unique feature of our method is that it can be extended to compute the local false discovery rates by combining multiple lists of \(p\)-values.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] Alcorta, D. A., Barnes, D. A., Dooley, M. A., Sullivan, P., Jonas, B., Liu, Y., Lionaki, S., Reddy, C. B., Chin, H. et al. (2007). Leukocyte gene expression signatures in antineutrophil cytoplasmic autoantibody and lupus glomerulonephritis. Kidney Int. 72 853-64.
[2] Bagnoli, M. and Bergstrom, T. (2005). Log-concave probability and its applications. Econom. Theory 26 445-469. · Zbl 1077.60012 · doi:10.1007/s00199-004-0514-4
[3] Balabdaoui, F. and Doss, C. R. (2018). Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 1053-1071. · Zbl 1419.62059 · doi:10.3150/16-BEJ864
[4] Bordes, L., Delmas, C. and Vandekerkhove, P. (2006). Semiparametric estimation of a two-component mixture model where one component is known. Scand. J. Stat. 33 733-752. · Zbl 1164.62331 · doi:10.1111/j.1467-9469.2006.00515.x
[5] Chang, G. T. and Walther, G. (2007). Clustering with mixtures of log-concave distributions. Comput. Statist. Data Anal. 51 6242-6251. · Zbl 1445.62141 · doi:10.1016/j.csda.2007.01.008
[6] Coordinators, N. R. (2017). Database resources of the national center for biotechnology information. Nucleic Acids Res. 45 (Database issue) D12-D17.
[7] Cule, M., Gramacy, R. and Samworth, R. (2009). LogConcDEAD: An R package for maximum likelihood estimation of a multivariate log-concave density. J. Stat. Softw. 29.
[8] Cule, M., Samworth, R. and Stewart, M. (2010). Maximum likelihood estimation of a multi-dimensional log-concave density. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 545-607. · Zbl 1411.62055 · doi:10.1111/j.1467-9868.2010.00753.x
[9] Dümbgen, L. and Rufibach, K. (2009). Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency. Bernoulli 15 40-68. · Zbl 1200.62030 · doi:10.3150/08-BEJ141
[10] Dümbgen, L. and Rufibach, K. (2011). logcondens: Computations related to univariate log-concave density estimation. J. Stat. Softw. 39 1-28.
[11] Edgar, R., Domrachev, M. and Lash, A. E. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30 207-210.
[12] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1-22. · Zbl 1327.62046 · doi:10.1214/07-STS236
[13] Efron, B. (2009). Empirical Bayes estimates for large-scale prediction problems. J. Amer. Statist. Assoc. 104 1015-1028. · Zbl 1388.62009 · doi:10.1198/jasa.2009.tm08523
[14] Efron, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics (IMS) Monographs 1. Cambridge Univ. Press, Cambridge. · Zbl 1277.62016
[15] Efron, B., Turnbull, B. and Narasimhan, B. (2015). locfdr: Computes local false discovery rates. R package version 1.1-8. https://CRAN.R-project.org/package=locfdr.
[16] Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw, R., Jassal, B., Jupe, S., Korninger, F. et al. (2016). The reactome pathway knowledgebase. Nucleic Acids Res. 44 (Database issue) D481-D487.
[17] Fabrizio, M., Nonino, M., Bono, G., Ferraro, I., François, P., Iannicola, G., Monelli, M., Thévenin, F., Stetson, P. B. et al. (2011). The Carina Project. IV. Radial velocity distribution. Publ. Astron. Soc. Pac. 123 384-401.
[18] Friedman, M. A., Choi, D., Planck, S. R., Rosenbaum, J. T. and Sibley, C. (2019). Gene expression pathways across multiple tissues in anti-neutrophil cytoplasmic antibody-associated vasculitis reveal core pathways of disease pathology. J. Rheumatol.
[19] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035-1061. · Zbl 1092.62065 · doi:10.1214/009053604000000283
[20] Grayson, P. C., Steiling, K., Platt, M., Berman, J. S., Zhang, X., Xiao, J., Alekseyev, Y. O., Liu, G., Monach, P. A. et al. (2015). Defining the nasal transcriptome in granulomatosis with polyangiitis (Wegener’s). Arthritis Rheumatol. 67 2233-2239.
[21] Held, L. (2019). On the Bayesian interpretation of the harmonic mean \(p\)-value. Proc. Natl. Acad. Sci. USA 116 5855-5856. · Zbl 1431.62238 · doi:10.1073/pnas.1900671116
[22] Hu, H., Wu, Y. and Yao, W. (2016). Maximum likelihood estimation of the mixture of log-concave densities. Comput. Statist. Data Anal. 101 137-147. · Zbl 1466.62105 · doi:10.1016/j.csda.2016.03.002
[23] Hung, H. M. J., O’Neill, R. T., Bauer, P. and Köhne, K. (1997). The behavior of the \(P\)-value when the alternative hypothesis is true. Biometrics 53 11-22. · Zbl 0876.62015
[24] Hunter, D. R., Wang, S. and Hettmansperger, T. P. (2007). Inference for mixtures of symmetric distributions. Ann. Statist. 35 224-251. · Zbl 1114.62035 · doi:10.1214/009053606000001118
[25] Jeong, S.-O., Choi, D. and Jang, W. (2020). Supplement to “A semiparametric mixture method for local false discovery rate estimation from multiple studies.” https://doi.org/10.1214/20-AOAS1341SUPP
[26] Johnstone, I. M. and Silverman, B. W. (2005). Empirical Bayes selection of wavelet thresholds. Ann. Statist. 33 1700-1752. · Zbl 1078.62005 · doi:10.1214/009053605000000345
[27] Kallenberg, C. G. M. (2014). Key advances in the clinical approach to ANCA-associated vasculitis. Nat. Rev. Rheumatol. 10 484-493.
[28] Kumar Patra, R. and Sen, B. (2016). Estimation of a two-component mixture model with applications to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 869-893. · Zbl 1414.62111 · doi:10.1111/rssb.12148
[29] Macfarlane, D. G., Bourne, J. T., Dieppe, P. A. and Easty, D. L. (1983). Indolent Wegener’s granulomatosis. Ann. Rheum. Dis. 42 398-407.
[30] Ploner, A., Calza, S., Gusnanto, A. and Pawitan, Y. (2006). Multidimensional local false discovery rate for microarray studies. Bioinformatics 22 556-565.
[31] Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W. and Smyth, G. K. (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43 e47.
[32] Rosenbaum, J. T., Choi, D., Wilson, D. J., Grossniklaus, H. E., Harrington, C. A., Sibley, C. H. et al. (2015). Orbital pseudotumor can be a localized form of granulomatosis with polyangiitis as revealed by gene expression profiling. Exp. Mol. Pathol. 99 271-278.
[33] Sellke, T., Bayarri, M. J. and Berger, J. O. (2001). Calibration of \(p\) values for testing precise null hypotheses. Amer. Statist. 55 62-71. · Zbl 1182.62053 · doi:10.1198/000313001300339950
[34] Soderberg, D. and Segelmark, M. (2016). Neutrophil extracellular traps in ANCA-associated vasculitis. Front. Immunol. 7 256.
[35] Walther, G. (2002). Detecting the presence of mixing with multiscale maximum likelihood. J. Amer. Statist. Assoc. 97 508-513. · Zbl 1073.62533 · doi:10.1198/016214502760047032
[36] Walther, G. (2009). Inference and modeling with log-concave distributions. Statist. Sci. 24 319-327. · Zbl 1329.62192 · doi:10.1214/09-STS303
[37] Wilson, D. J. (2019). The harmonic mean \(p\)-value for combining dependent tests. Proc. Natl. Acad. Sci. USA 116 1195-1200. · Zbl 1416.62303
[38] Yan, J.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.