A phylogenetic scan test on a Dirichlet-tree multinomial model for microbiome data. (English) Zbl 1393.62102

Summary: In this paper, we introduce the phylogenetic scan test (PhyloScan) for investigating cross-group differences in microbiome compositions using the Dirichlet-tree multinomial (DTM) model. DTM models the microbiome data through a cascade of independent local DMs on the internal nodes of the phylogenetic tree. Each of the local DMs captures the count distributions of a certain number of operational taxonomic units at a given resolution. Since distributional differences tend to occur in clusters along evolutionary lineages, we design a scan statistic over the phylogenetic tree to allow nodes to borrow signal strength from their parents and children. We also derive a formula to bound the tail probability of the scan statistic, and verify its accuracy through simulations. The PhyloScan procedure is applied to the American Gut dataset to identify taxa associated with diet habits. Empirical studies performed on this dataset show that PhyloScan achieves higher testing power in most cases.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62G10 Nonparametric hypothesis testing
Full Text: DOI arXiv Euclid


[1] Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., Pena, A. G., Goodrich, J. K., Gordon, J. I. et al. (2010). QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7 335-336.
[2] Chen, Y. and Hanson, T. E. (2014). Bayesian nonparametric \(k\)-sample tests for censored and uncensored data. Comput. Statist. Data Anal.71 335-346. · Zbl 1471.62041
[3] Chen, J. and Li, H. (2013). Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis. Ann. Appl. Stat.7 418-442. · Zbl 1454.62317
[4] David, L. A., Maurice, C. F., Carmody, R. C., Gootenberg, D. B., Button, J. E., Wolfe, B. E., Ling, A. V., Devlin, A. S., Varma, Y., Fischbach, M. A. et al. (2014). Diet rapidly and reproducibly alters the human gut microbiome. Nature 505 559-563.
[5] Dennis, S. Y. III (1991). On the hyper-Dirichlet type \(1\) and hyper-Liouville distributions. Comm. Statist. Theory Methods 20 4069-4081. · Zbl 0800.62059
[6] Dohmen, K. (2000). Improved Bonferroni inequalities via union-closed set systems. J. Combin. Theory Ser. A 92 61-67. · Zbl 0958.05007
[7] Dohmen, K. (2002). Improved inclusion-exclusion identities and Bonferroni inequalities with reliability applications. SIAM J. Discrete Math.16 156-171. · Zbl 1029.05008
[8] Dohmen, K. and Tittmann, P. (2004). Bonferroni-Galambos inequalities for partition lattices. Electron. J. Combin.11 Article ID 85. · Zbl 1062.05015
[9] Efron, B. (1997). The length heuristic for simultaneous hypothesis tests. Biometrika 84 143-157. · Zbl 0892.62048
[10] Glaz, J., Naus, J. and Wallenstein, S. (2001). Scan Statistics. Springer, New York. · Zbl 0983.62075
[11] Hahn, T. (2005). Cuba—A library for multidimensional numerical integration. Comput. Phys. Commun.168 78-95. · Zbl 1196.65052
[12] Holmes, I., Harris, K. and Quince, C. (2012). Dirichlet multinomial mixtures: Generative models for microbial metagenomics. PLoS ONE 7 Article ID e30126.
[13] Holmes, C. C., Caron, F., Griffin, J. E. and Stephens, D. A. (2015). Two-sample Bayesian nonparametric hypothesis testing. Bayesian Anal.10 297-320. · Zbl 1334.62082
[14] Human Microbiome Project Consortium (2012). A framework for human microbiome research. Nature 486 215-221.
[15] Hunter, D. (1976). An upper bound for the probability of a union. J. Appl. Probab.13 597-603. · Zbl 0349.60007
[16] Lavine, M. (1992). Some aspects of Pólya tree distributions for statistical modelling. Ann. Statist.20 1222-1235. · Zbl 0765.62005
[17] La Rosa, P. S., Brooks, J. P., Deych, E., Boone, E. L., Edwards, D. J., Wang, Q., Sodergren, E., Weinstock, G. and Shannon, W. D. (2012). Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS ONE 7 Article ID e52078. DOI:10.1371/journal.pone.0052078.
[18] Ma, L. and Wong, W. H. (2011). Coupling optional Pólya trees and the two sample problem. J. Amer. Statist. Assoc.106 1553-1565. · Zbl 1233.62104
[19] McDonald, D., Birmingham, A. and Knight, R. (2015a). Context and the human microbiome. Microbiome 3 1-8.
[20] McDonald, D., Hornig, M., Lozupone, C., Debelius, J., Gilbert, J. and Knight, R. (2015b). Towards large-cohort comparative studies to define the factors influencing the gut microbial community structure of ASD patients. Microb. Ecol. Health Dis.26 26555.
[21] Mosimann, J. E. (1962). On the compound multinomial distribution, the multivariate \(β\)-distribution, and correlations among proportions. Biometrika 49 65-82. · Zbl 0105.12502
[22] Naiman, D. Q. and Wynn, H. P. (1992). Inclusion-exclusion-Bonferroni identities and inequalities for discrete tube-like problems via Euler characteristics. Ann. Statist.20 43-76. · Zbl 0752.62028
[23] Naiman, D. Q. and Wynn, H. P. (1997). Abstract tubes, improved inclusion-exclusion identities and inequalities and importance sampling. Ann. Statist.25 1954-1983. · Zbl 0902.60017
[24] Neuman, H., Debelius, J. W., Knight, R. and Koren, O. (2015). Microbial endocrinology: The interplay between the microbiota and the endocrine system. FEMS Microbiol. Rev.39 509-521.
[25] Silverman, J. D., Washburne, A., Mukherjee, S. and David, L. A. (2017). A phylogenetic transform enhances analysis of compositional microbiota data. ELife 6 Article ID e21887.
[26] Soriano, J. and Ma, L. (2017). Probabilistic multi-resolution scanning for two-sample differences. J. R. Stat. Soc. Ser. B. Stat. Methodol.79 547-572. · Zbl 1414.62149
[27] Tang, Y., Ma, L. and Nicolae, D. L. (2018). Supplement to “A phylogenetic scan test on a Dirichlet-tree multinomial model for microbiome data.” DOI:10.1214/17-AOAS1086SUPP. · Zbl 1393.62102
[28] Tang, Z., Chen, G., Alekseyenko, A. V. and Li, H. (2017). A general framework for association analysis of microbial communities on a taxonomic tree. Bioinformatics 33 1278-1285. DOI:10.1093/bioinformatics/btw804.
[29] Taylor, J. E., Worsley, K. J. and Gosselin, F. (2007). Maxima of discretely sampled random fields, with an application to ‘bubbles’. Biometrika 94 1-18. · Zbl 1143.62059
[30] Turnbaugh, P. J., Ridaura, V. K., Faith, J. J., Rey, F. E., Knight, R. and Gordon, J. I. (2014). The effect of diet on the human gut microbiome: A metagenomic analysis in humanized gnotobiotic mice. Sci. Transl. Med.1 Article ID 6ra14. DOI:10.1126/scitranslmed.3000322.
[31] Wang, T. and Zhao, H. (2017). A Dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms. Biometrics 73 792-801.
[32] Weir, B. S. and Hill, W. G. (2002). Estimating F-statistics. Annu. Rev. Genet.36 721-750.
[33] Worsley, K. J. (1982). An improved Bonferroni inequality and applications. Biometrika 69 297-302. · Zbl 0497.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.