Bayesian factor models for probabilistic cause of death assessment with verbal autopsies. (English) Zbl 1439.62220

Summary: The distribution of deaths by cause provides crucial information for public health planning, response and evaluation. About 60% of deaths globally are not registered or given a cause, limiting our ability to understand disease epidemiology. Verbal autopsy (VA) surveys are increasingly used in such settings to collect information on the signs, symptoms and medical history of people who have recently died. This article develops a novel Bayesian method for estimation of population distributions of deaths by cause using verbal autopsy data. The proposed approach is based on a multivariate probit model where associations among items in questionnaires are flexibly induced by latent factors. Using the Population Health Metrics Research Consortium labeled data that include both VA and medically certified causes of death, we assess performance of the proposed method. Further, we estimate important questionnaire items that are highly associated with causes of death. This framework provides insights that will simplify future data.


62P10 Applications of statistics to biology and medical sciences; meta analysis
92C60 Medical epidemiology
92D30 Epidemiology
62N05 Reliability and life testing
62H25 Factor analysis and principal components; correspondence analysis


BayesDA; R; Ox
Full Text: DOI arXiv Euclid


[1] AbouZahr, C., Cleland, J., Coullare, F., Macfarlane, S. B., Notzon, F. C., Setel, P., Szreter, S., Anderson, R. N., Bawah, A. A. et al. (2007). The way forward. Lancet 370 1791-1799.
[2] Arminger, G. and Muthén, B. O. (1998). A Bayesian approach to nonlinear latent variable models using the Gibbs sampler and the Metropolis-Hastings algorithm. Psychometrika 63 271-300. · Zbl 1291.62191
[3] Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98 291-306. · Zbl 1215.62025
[4] Bloomberg, M. R. and Bishop, J. (2015). Understanding death, extending life. Lancet 386 e18-e19.
[5] Byass, P., Chandramohan, D., Clark, S. J., D’Ambruoso, L., Fottrell, E., Graham, W. J., Herbst, A. J., Hodgson, A., Hounton, S. et al. (2012). Strengthening standardised interpretation of verbal autopsy data: The new InterVA-4 tool. Global Health Action 5 19281.
[6] Cramér, H. (1999). Mathematical Methods of Statistics. Princeton Landmarks in Mathematics. Princeton Univ. Press, Princeton, NJ. · Zbl 0985.62001
[7] de Savigny, D., Riley, I., Chandramohan, D., Odhiambo, F., Nichols, E., Notzon, S., AbouZahr, C., Mitra, R., Cobos Muñoz, D. et al. (2017). Integrating community-based verbal autopsy into civil registration and vital statistics (crvs): System-level considerations. Global Health Action 10 1272882.
[8] Doornik, J. A. (2007). Object-oriented matrix programming using Ox, 3rd ed. Timberlake Consultants, London, and www.doornik.com, Oxford.
[9] Dunson, D. B. and Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. J. Amer. Statist. Assoc. 104 1042-1051. · Zbl 1388.62151
[10] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2014). Bayesian Data Analysis, 3rd ed. Texts in Statistical Science Series. CRC Press, Boca Raton, FL.
[11] Hoff, P. D. (2009). A First Course in Bayesian Statistical Methods. Springer Texts in Statistics. Springer, New York.
[12] Horton, R. (2007). Counting for health. Lancet 370 1526.
[13] James, S. L., Flaxman, A. D. and Murray, C. J. (2011). Performance of the Tariff Method: Validation of a simple additive algorithm for analysis of verbal autopsies. Population Health Metrics 9 31.
[14] Jha, P. (2014). Reliable direct measurement of causes of death in low-and middle-income countries. BMC Medicine 12 19.
[15] King, G. and Lu, Y. (2008). Verbal autopsy methods with multiple causes of death. Statist. Sci. 23 78-91. · Zbl 1327.62506
[16] King, G., Lu, Y. and Shibuya, K. (2010). Designing verbal autopsy studies. Population Health Metrics 8 19.
[17] Kunihama, T., Li, Z. R., Clark, S. J. and McCormick, T. H. (2020). Supplement to “Bayesian factor models for probabilistic cause of death assessment with verbal autopsies.” https://doi.org/10.1214/19-AOAS1253SUPPA, https://doi.org/10.1214/19-AOAS1253SUPPB. · Zbl 1439.62220
[18] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ. · Zbl 1011.62004
[19] Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14 41-67. · Zbl 1035.62060
[20] Lopez, A. D. (1998). Counting the dead in China. BMJ 317 1399-1400.
[21] Lozano, R., Lopez, A. D., Atkinson, C., Naghavi, M., Flaxman, A. D. and Murray, C. J. (2011). Performance of physician-certified verbal autopsies: Multisite validation study using clinical diagnostic gold standards. Population Health Metrics 9 32.
[22] Maher, D., Biraro, S., Hosegood, V., Isingo, R., Lutalo, T., Mushati, P., Ngwira, B., Nyirenda, M., Todd, J. et al. (2010). Translating global health research aims into action: The example of the ALPHA network. Tropical Medicine & International Health 15 321-328.
[23] Mathers, C. D., Fat, D. M., Inoue, M., Rao, C. and Lopez, A. D. (2005). Counting the dead and what they died from: An assessment of the global status of cause of death data. Bulletin of the World Health Organization 83 171-177.
[24] McCormick, T. H., Li, Z. R., Calvert, C., Crampin, A. C., Kahn, K. and Clark, S. J. (2016). Probabilistic cause-of-death assignment using verbal autopsies. J. Amer. Statist. Assoc. 111 1036-1049.
[25] Mealli, F. and Rubin, D. B. (2015). Clarifying missing at random and related definitions, and implications when coupled with exchangeability. Biometrika 102 995-1000. · Zbl 1390.62042
[26] Miasnikof, P., Giannakeas, V., Gomes, M., Aleksandrowicz, L., Shestopaloff, A. Y., Alam, D., Tollman, S., Samarikhalaj, A. and Jha, P. (2015). Naive Bayes classifiers for verbal autopsies: Comparison to physician-based classification for 21,000 child and adult deaths. BMC Medicine 13 286.
[27] Mikkelsen, L., Phillips, D. E., AbouZahr, C., Setel, P. W., De Savigny, D., Lozano, R. and Lopez, A. D. (2015). A global assessment of civil registration and vital statistics systems: Monitoring data quality and progress. Lancet 386 1395-1406.
[28] Montagna, S., Tokdar, S. T., Neelon, B. and Dunson, D. B. (2012). Bayesian latent factor regression for functional and longitudinal data. Biometrics 68 1064-1073. · Zbl 1258.62030
[29] Murray, C. J., Lopez, A. D., Black, R., Ahuja, R., Ali, S. M., Baqui, A., Dandona, L., Dantzer, E., Das, V. et al. (2011). Population Health Metrics Research Consortium gold standard verbal autopsy validation study: Design, implementation, and development of analysis datasets. Population Health Metrics 9 27.
[30] Navarro, D. (2015). Learning statistics with R: A tutorial for psychology students and other beginners. (Version 0.5). Univ. Adelaide. Available at http://ua.edu.au/ccs/teaching/lsr.
[31] Nichols, E. K., Byass, P., Chandramohan, D., Clark, S. J., Flaxman, A. D., Jakob, R., Leitao, J., Maire, N., Rao, C. et al. (2018). The WHO 2016 verbal autopsy instrument: An international standard suitable for automated analysis by InterVA, InSilicoVA, and Tariff 2.0. PLoS. Medicine 15 e1002486.
[32] Phillips, D. E., AbouZahr, C., Lopez, A. D., Mikkelsen, L., De Savigny, D., Lozano, R., Wilmoth, J. and Setel, P. W. (2015). Are well functioning civil registration and vital statistics systems associated with better health outcomes? Lancet 386 1386-1394.
[33] PHMRC (2013). Population Health Metrics Research Consortium gold standard verbal autopsy data 2005-2011. Available at http://ghdx.healthdata.org/record/population-health-metrics-research-consortium-gold-standard-verbal-autopsy-data-2005-2011.
[34] R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at https://www.R-project.org/.
[35] Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581-592. · Zbl 0344.62034
[36] Ruzicka, L. T. and Lopez, A. D. (1990). The use of cause-of-death statistics for health situation assessment: National and international experiences. World Health Statistics Quarterly 43 249-258.
[37] Sankoh, O. and Byass, P. (2012). The INDEPTH Network: Filling vital gaps in global epidemiology. Int. J. Epidemiol. 41 579-588.
[38] Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013). What is meant by “missing at random”? Statist. Sci. 28 257-268. · Zbl 1331.62036
[39] Serina, P., Riley, I., Stewart, A., James, S. L., Flaxman, A. D., Lozano, R., Hernandez, B., Mooney, M. D., Luning, R. et al. (2015). Improving performance of the Tariff Method for assigning causes of death to verbal autopsies. BMC Medicine 13 291.
[40] Soleman, N., Chandramohan, D. and Shibuya, K. (2006). Verbal autopsy: Current practices and challenges. Bulletin of the World Health Organization 84 239-245.
[41] World Health Organization (2012). Verbal Autopsy Standards: The 2012 WHO verbal autopsy instrument. Available at https://goo.gl/bQXXhG.
[42] World Health Organization (2017). Verbal Autopsy Standards: The 2016 WHO verbal autopsy instrument. Available at https://goo.gl/Hgt6es.
[43] Yang, G., Hu, J., Rao, K. Q., Ma, J., Rao, C. and Lopez, A. D. (2005). Mortality registration and surveillance in China: History, current situation and challenges. Population Health Metrics 3 3.
[44] Zhou, X., Nakajima, J. and West, M. (2014). Bayesian forecasting and portfolio decisions using dynamic dependent sparse factor models. Int. J. Forecast. 30 963-980.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.