zbMATH — the first resource for mathematics

Using Bayesian latent Gaussian graphical models to infer symptom associations in verbal autopsies. (English) Zbl 1459.62096
Summary: Learning dependence relationships among variables of mixed types provides insights in a variety of scientific settings and is a well-studied problem in statistics. Existing methods, however, typically rely on copious, high quality data to accurately learn associations. In this paper, we develop a method for scientific settings where learning dependence structure is essential, but data are sparse and have a high fraction of missing values. Specifically, our work is motivated by survey-based cause of death assessments known as verbal autopsies (VAs). We propose a Bayesian approach to characterize dependence relationships using a latent Gaussian graphical model that incorporates informative priors on the marginal distributions of the variables. We demonstrate such information can improve estimation of the dependence structure, especially in settings with little training data. We show that our method can be integrated into existing probabilistic cause-of-death assignment algorithms and improves model performance while recovering dependence patterns between symptoms that can inform efficient questionnaire design in future data collection.
Reviewer: Reviewer (Berlin)
62H22 Probabilistic graphical models
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62N05 Reliability and life testing
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D25 Population dynamics (general)
BDgraph; bfa; EMVS; GHS; HdBCS; openVA; R
Full Text: DOI Euclid
[1] Andrews, J. L. and McNicholas, P. D. (2014). “Variable selection for clustering and classification.” Journal of Classification, 31(2): 136-153. · Zbl 1360.62310
[2] Barnard, J., McCulloch, R., and Meng, X.-L. (2000). “Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, With Application To Shrinkage.” Statistica Sinica, 10(4): 1281-1311. · Zbl 0980.62045
[3] Bhadra, A., Rao, A., and Baladandayuthapani, V. (2018). “Inferring network structure in non-normal and mixed discrete-continuous genomic data.” Biometrics, 74(1): 185-195. · Zbl 1415.62085
[4] Bu, Y. and Lederer, J. (2017). “Integrating Additional Knowledge Into Estimation of Graphical Models.” arXiv preprint arXiv:1704.02739.
[5] Byass, P., Huong, D. L., and Van Minh, H. (2003). “A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam.” Scandinavian Journal of Public Health, 31(62 suppl): 32-37.
[6] Clark, S. J., Li, Z. R., and McCormick, T. H. (2018). “Quantifying the contributions of training data and algorithm logic to the performance of automated cause-assignment algorithms for Verbal Autopsy.” arXiv preprint arXiv:1803.07141.
[7] Crampin, A. C., Dube, A., Mboma, S., Price, A., Chihana, M., Jahn, A., Baschieri, A., Molesworth, A., Mwaiyeghele, E., Branson, K., et al. (2012). “Profile: the Karonga health and demographic surveillance system.” International Journal of Epidemiology, 41(3): 676-685.
[8] Deshpande, S. K., Rockova, V., and George, E. I. (2017). “Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-Slab Lasso.” arXiv preprint arXiv:1708.08911.
[9] Dobra, A., Lenkoski, A., et al. (2011). “Copula Gaussian graphical models and their application to modeling functional disability data.” The Annals of Applied Statistics, 5(2A): 969-993. · Zbl 1232.62046
[10] Fan, J., Liu, H., Ning, Y., and Zou, H. (2016). “High dimensional semiparametric latent graphical model for mixed data.” Journal of the Royal Statistical Society: Series B (Statistical Methodology). · Zbl 1414.62179
[11] Gan, L., Narisetty, N. N., and Liang, F. (2018). “Bayesian regularization for graphical models with unequal shrinkage.” Journal of the American Statistical Association, 1-14. · Zbl 1428.62225
[12] Gelman, A. (2006). “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian analysis, 1(3): 515-534. · Zbl 1331.62139
[13] Gruhl, J., Erosheva, E. A., and Crane, P. K. (2013). “A semiparametric approach to mixed outcome latent variable models: Estimating the association between cognition and regional brain volumes.” The Annals of Applied Statistics, 2361-2383. · Zbl 1283.62218
[14] Hoff, P. D. (2007). “Extending the rank likelihood for semiparametric copula estimation.” The Annals of Applied Statistics, 265-283. · Zbl 1129.62050
[15] Horton, R. (2007). “Counting for health.” Lancet, 370(9598): 1526.
[16] James, S. L., Flaxman, A. D., Murray, C. J., and Consortium Population Health Metrics Research (2011). “Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies.” Population Health Metrics, 9(31).
[17] Jha, P. (2014). “Reliable direct measurement of causes of death in low-and middle-income countries.” BMC medicine, 12(1): 19.
[18] Jin, Z. and Matteson, D. S. (2018). “Independent Component Analysis via Energy-based and Kernel-based Mutual Dependence Measures.” arXiv preprint arXiv:1805.06639.
[19] Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C., and West, M. (2005). “Experiments in stochastic computation for high-dimensional graphical models.” Statistical Science, 388-400. · Zbl 1130.62408
[20] King, G. and Lu, Y. (2008). “Verbal autopsy methods with multiple causes of death.” Statistical Science, 100(469). · Zbl 1327.62506
[21] Klaassen, C. A. and Wellner, J. A. (1997). “Efficient estimation in the bivariate normal copula model: normal margins are least favourable.” Bernoulli, 3(1): 55-77. · Zbl 0877.62055
[22] Kunihama, T., Li, Z. R., Clark, S. J., and McCormick, T. H. (2018). “Bayesian factor models for probabilistic cause of death assessment with verbal autopsies.” arXiv preprint arXiv:1803.01327. · Zbl 1439.62220
[23] Lenkoski, A. and Dobra, A. (2011). “Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior.” Journal of Computational and Graphical Statistics, 20(1): 140-157.
[24] Li, Y., Craig, B. A., and Bhadra, A. (2017). “The Graphical Horseshoe Estimator for Inverse Covariance Matrices.” arXiv preprint arXiv:1707.06661.
[25] Li, Z. and McCormick, T. H. (2019). “An Expectation Conditional Maximization approach for Gaussian graphical models.” Journal of Computational and Graphical Statistics, 1-11.
[26] Li, Z. R., McCormick, T., and Clark, S. (2019a). openVA: Automated Method for Verbal Autopsy. R package version 1.0.8. URL http://CRAN.R-project.org/package=openVA.
[27] Li, Z. R., McCormick, T., and Clark, S. (2019b). “Supplementary Material to “Using Bayesian latent Gaussian graphical models to infer symptom associations in verbal autopsies”.” Bayesian Analysis.
[28] Liu, H., Han, F., Yuan, M., Lafferty, J., Wasserman, L., et al. (2012). “High-dimensional semiparametric Gaussian copula graphical models.” The Annals of Statistics, 40(4): 2293-2326. · Zbl 1297.62073
[29] Liu, H., Lafferty, J., and Wasserman, L. (2009). “The nonparanormal: Semiparametric estimation of high dimensional undirected graphs.” Journal of Machine Learning Research, 10(Oct): 2295-2328. · Zbl 1235.62035
[30] Liu, J. S. and Wu, Y. N. (1999). “Parameter Expansion for Data Augmentation.” Journal of the American Statistical Association, 94(448): 1264-1274. · Zbl 1069.62514
[31] McCormick, T. H., Li, Z. R., Calvert, C., Crampin, A. C., Kahn, K., and Clark, S. J. (2016). “Probabilistic cause-of-death assignment using verbal autopsies.” Journal of the American Statistical Association, 111(515): 1036-1049.
[32] Meng, X.-L. and Van Dyk, D. A. (1999). “Seeking efficient data augmentation schemes via conditional and marginal augmentation.” Biometrika, 86(2): 301-320. · Zbl 1054.62505
[33] Miasnikof, P., Giannakeas, V., Gomes, M., Aleksandrowicz, L., Shestopaloff, A. Y., Alam, D., Tollman, S., Samarikhalaj, A., and Jha, P. (2015). “Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths.” BMC medicine, 13(1): 1.
[34] Mohammadi, A., Abegaz, F., van den Heuvel, E., and Wit, E. C. (2017). “Bayesian modelling of Dupuytren disease by using Gaussian copula graphical models.” Journal of the Royal Statistical Society: Series C (Applied Statistics), 66(3): 629-645.
[35] Mohammadi, R. and Wit, E. C. (2017). “BDgraph: An R Package for Bayesian Structure Learning in Graphical Models.” arXiv preprint arXiv:1501.05108.
[36] Murray, C. J., Lopez, A. D., Black, R., Ahuja, R., Ali, S. M., Baqui, A., Dandona, L., Dantzer, E., Das, V., Dhingra, U., et al. (2011a). “Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets.” Population health metrics, 9(1): 27.
[37] Murray, C. J., Lozano, R., Flaxman, A. D., Vahdatpour, A., and Lopez, A. D. (2011b). “Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies.” , 9(1): 28.
[38] Murray, I., Adams, R., and MacKay, D. (2010). “Elliptical slice sampling.” In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 541-548.
[39] Murray, J. S., Dunson, D. B., Carin, L., and Lucas, J. E. (2013). “Bayesian Gaussian copula factor models for mixed data.” Journal of the American Statistical Association, 108(502): 656-665. · Zbl 06195968
[40] Nishihara, R., Murray, I., and Adams, R. P. (2014). “Parallel MCMC with generalized elliptical slice sampling.” The Journal of Machine Learning Research, 15(1): 2087-2112. · Zbl 1319.60153
[41] Peterson, C., Vannucci, M., Karakas, C., Choi, W., Ma, L., and Meletic-Savatic, M. (2013). “Inferring metabolic networks using the Bayesian adaptive graphical lasso with informative priors.” Statistics and its Interface, 6(4): 547. · Zbl 1326.92028
[42] R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
[43] Rocková, V. and George, E. I. (2014). “EMVS: The EM approach to Bayesian variable selection.” Journal of the American Statistical Association, 109(506): 828-846. · Zbl 1367.62049
[44] Roverato, A. (2002). “Hyper Inverse Wishart Distribution for Non-decomposable Graphs and its Application to Bayesian Inference for Gaussian Graphical Models.” Scandinavian Journal of Statistics, 29(3): 391-411. · Zbl 1036.62027
[45] Serina, P., Riley, I., Stewart, A., James, S. L., Flaxman, A. D., Lozano, R., Hernandez, B., Mooney, M. D., Luning, R., Black, R., et al. (2015). “Improving performance of the Tariff Method for assigning causes of death to verbal autopsies.” BMC medicine, 13(1): 1.
[46] Talhouk, A., Doucet, A., and Murphy, K. (2012). “Efficient Bayesian Inference for Multivariate Probit Models With Sparse Inverse Correlation Matrices.” Journal of Computational and Graphical Statistics, 21(February 2015): 739-757.
[47] Wang, H. (2015). “Scaling it up: Stochastic search structure learning in graphical models.” Bayesian Analysis, 10(2): 351-377. · Zbl 1335.62068
[48] Wang, H. et al. (2012). “Bayesian graphical lasso models and efficient posterior computation.” Bayesian Analysis, 7(4): 867-886. · Zbl 1330.62041
[49] Xue, L. · Zbl 1373.62138
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.