Hierarchical infinite factor models for improving the prediction of surgical complications for geriatric patients. (English) Zbl 1435.62395

Summary: Nearly a third of all surgeries performed in the United States occur for patients over the age of 65; these older adults experience a higher rate of postoperative morbidity and mortality. To improve the care for these patients, we aim to identify and characterize high risk geriatric patients to send to a specialized perioperative clinic while leveraging the overall surgical population to improve learning. To this end, we develop a hierarchical infinite latent factor model (HIFM) to appropriately account for the covariance structure across subpopulations in data. We propose a novel Hierarchical Dirichlet Process shrinkage prior on the loadings matrix that flexibly captures the underlying structure of our data while sharing information across subpopulations to improve inference and prediction. The stick-breaking construction of the prior assumes an infinite number of factors and allows for each subpopulation to utilize different subsets of the factor space and select the number of factors needed to best explain the variation. We develop the model into a latent factor regression method that excels at prediction and inference of regression coefficients. Simulations validate this strong performance compared to baseline methods. We apply this work to the problem of predicting surgical complications using electronic health record data for geriatric patients and all surgical patients at Duke University Health System (DUHS). The motivating application demonstrates the improved predictive performance when using HIFM in both area under the ROC curve and area under the PR Curve while providing interpretable coefficients that may lead to actionable interventions.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62M20 Inference from stochastic processes and prediction
62G05 Nonparametric estimation
Full Text: DOI arXiv Euclid


[1] AHRQ (2016). Healthcare cost and utilization project (hcup) surgery flag software. https://www.hcup-us.ahrq.gov/toolssoftware/surgflags/surgeryflags.jsp.
[2] Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669-679. · Zbl 0774.62031
[3] Avalos-Pacheco, A., Rossell, D. and Savage, R. S. (2018). Heterogeneous large datasets integration using Bayesian factor regression. Preprint. Available at arXiv:1810.09894.
[4] Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98 291-306. · Zbl 1215.62025
[5] Caron, F. and Doucet, A. (2008). Sparse Bayesian nonparametric regression. In Proceedings of the 25th International Conference on Machine Learning 88-95. ACM, New York.
[6] Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008a). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Amer. Statist. Assoc. 103 1438-1456. · Zbl 1286.62091
[7] Chen, M., Silva, J., Paisley, J., Wang, C., Dunson, D. and Carin, L. (2010). Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: Algorithm and performance bounds. IEEE Trans. Signal Process. 58 6140-6155. · Zbl 1392.94139
[8] Corey, K. M., Kashyap, S., Lorenzi, E., Lagoo-Deenadayalan, S. A., Heller, K., Whalen, K., Balu, S., Heflin, M. T., McDonald, S. R. et al. (2018). Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study. PLoS Med. 15 e1002701.
[9] Desebbe, O., Lanz, T., Kain, Z. and Cannesson, M. (2016). The perioperative surgical home: An innovative, patient-centred and cost-effective perioperative care model. Anaesth. Crit. Care Pain Med. 35 59-66.
[10] Elixhauser, A., Steiner, C., Harris, D. R. and Rm, C. (1998). Comorbidity measures for use with administrative data. Med. Care 36.
[11] Etzioni, D. A., Liu, J. H., O’Connell, J. B., Maggard, M. A. and Ko, C. Y. (2003). Elderly patients in surgical workloads: A population-based analysis. Am. J. Surg. 69 961-965.
[12] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230. · Zbl 0255.62037
[13] Gong, J. J., Sundt, T. M., Rawn, J. D. and Guttag, J. V. (2015). Instance weighting for patient-specific risk stratification models. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 369-378. ACM, New York.
[14] Hanover, N. (2001). Operative mortality with elective surgery in older adults. Eff. Clin. Pract. 4 172-177.
[15] Healey, M. A., Shackford, S. R., Osler, T. M., Rogers, F. B. and Burns, E. (2002). Complications in surgical patients. Arch. Surg. 137 611-618.
[16] Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161-173. · Zbl 1014.62006
[17] Ishwaran, H. and James, L. F. (2002). Approximate Dirichlet process computing in finite normal mixtures: Smoothing and prior information. J. Comput. Graph. Statist. 11 508-532.
[18] Jones, T. S., Dunn, C. L., Wu, D. S., Cleveland, J. C., Kile, D. and Robinson, T. N. (2013). Relationship between asking an older adult about falls and surgical outcomes. J. Am. Med. Assoc. Surg. 148 1132-1138.
[19] Lee, G., Rubinfeld, I. and Syed, Z. (2012). Adapting surgical models to individual hospitals using transfer learning. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on 57-63. IEEE, New York.
[20] Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14 41-67. · Zbl 1035.62060
[21] Lorenzi, E., Henao, R. and Heller, K. (2019). Supplement to “Hierarchical infinite factor models for improving the prediction of surgical complications for geriatric patients.” DOI:10.1214/19-AOAS1292SUPPA, DOI:10.1214/19-AOAS1292SUPPB, DOI:10.1214/19-AOAS1292SUPPC.
[22] Lucas, J., Carvalho, C., Wang, Q., Bild, A., Nevins, J. and West, M. (2006). Sparse statistical modelling in gene expression genomics. Bayesian Inference for Gene Expression and Proteomics 1 1.
[23] McDonald, S. R., Heflin, M. T., Whitson, H. E., Dalton, T. O., Lidsky, M. E., Liu, P., Poer, C. M., Sloane, R., Thacker, J. K. et al. (2018). Association of integrated care coordination with postsurgical outcomes in high-risk older adults: The perioperative optimization of senior health (POSH) initiative. JAMA Surg. DOI: 10.1001/jamasurg.2017.5513
[24] McParland, D., Gormley, I. C., McCormick, T. H., Clark, S. J., Kabudula, C. W. and Collinson, M. A. (2014). Clustering South African households based on their asset status using latent variable models. Ann. Appl. Stat. 8 747-776. · Zbl 1454.62503
[25] McParland, D., Phillips, C. M., Brennan, L., Roche, H. M. and Gormley, I. C. (2017). Clustering high-dimensional mixed data to uncover sub-phenotypes: Joint analysis of phenotypic and genotypic data. Stat. Med. 36 4548-4569.
[26] Murphy, K., Gormley, I. C. and Viroli, C. (2017). Infinite mixtures of infinite factor analysers: Nonparametric model-based clustering via latent gaussian models. Preprint. Available at arXiv:1701.07010.
[27] Ni, Y., Mueller, P. and Ji, Y. (2018). Bayesian double feature allocation for phenotyping with electronic health records. Preprint. Available at arXiv:1809.08988.
[28] Polson, N. G. and Scott, J. G. (2010). Shrink globally, act locally: Sparse Bayesian regularization and prediction. In Bayesian Statistics 9 501-538. Oxford Univ. Press, Oxford.
[29] Raval, M. V. and Eskandari, M. K. (2012). Outcomes of elective abdominal aortic aneurysm repair among the elderly: Endovascular versus open repair. Surgery 151 245-260.
[30] Ročková, V. and George, E. I. (2016). Fast Bayesian factor analysis via automatic rotations to sparsity. J. Amer. Statist. Assoc. 111 1608-1622.
[31] Seo, D. M., Goldschmidt-Clermont, P. J. and West, M. (2007). Of mice and men: Space statistical modeling in cardiovascular genomics. Ann. Appl. Stat. 1 152-178. · Zbl 1129.62104
[32] Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statist. Sinica 4 639-650. · Zbl 0823.62007
[33] Speziale, G., Nasso, G., Barattoni, M. C., Esposito, G., Popoff, G., Argano, V., Greco, E., Scorcin, M., Zussa, C. et al. (2011). Short-term and long-term results of cardiac surgery in elderly and very elderly patients. J. Thorac. Cardiovasc. Surg. 141 725-731.
[34] Teh, Y. W., Jordan, M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101 1566-1581. · Zbl 1171.62349
[35] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[36] West, M. (2003). Bayesian factor regression models in the “large \(p\), small \(n\)” paradigm. In Bayesian Statistics, 7 (Tenerife, 2002) 733-742. Oxford Univ. Press, New York.
[37] Wiens, J., Guttag, J. and Horvitz, E. (2014). A study in transfer learning: Leveraging data from multiple hospitals to enhance hospital-specific predictions. J. Am. Med. Inform. Assoc. 21 699-706.
[38] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.