zbMATH — the first resource for mathematics

Stable graphical model estimation with random forests for discrete, continuous, and mixed variables. (English) Zbl 06958947
Summary: Random Forests in combination with Stability Selection allow to estimate stable conditional independence graphs with an error control mechanism for false positive selection. This approach is applicable to graphs containing both continuous and discrete variables at the same time. Its performance is evaluated in various simulation settings and compared with alternative approaches. Finally, the approach is applied to two heath-related data sets, first to study the interconnection of functional health components, personal, and environmental factors and second to identify risk factors which may be associated with adverse neurodevelopment after open-heart surgery.

62 Statistics
Full Text: DOI
[1] Altman, D. G.; Royston, P., The cost of dichotomising continuous variables, Brit. Med. J., 332, 1080, (2006)
[2] Amit, Y.; Geman, D., Shape quantization and recognition with randomized trees, Neural Comput., 9, 1545-1588, (1997)
[3] Apgar, V., A proposal for a new method of evaluation of the newborn infant, Curr. Res. Anesth. Analg., 32, 260-267, (1953)
[4] Archer, K. J., Rpartordinal: an R package for deriving a classification tree for predicting an ordinal response, J. Stat. Softw., 34, 1-17, (2010)
[5] Australian Bureau of Statistics, National health survey: summary of results, 2007-2008, (2009), Australian Bureau of Statistics Canberra
[6] Ballweg, J. A.; Wernovsky, G.; Gaynor, J. W., Neurodevelopmental outcomes following congenital heart surgery, Pediatr. Cardiol., 28, 126-133, (2007)
[7] Bayley, N., Manual for the bayley scales of infant development, (1993), The Psychological Corporation San Antonio, TX
[8] Bellinger, D. C.; Wypij, D.; duPlessis, A. J.; Rappaport, L. A.; Jonas, R. A.; Wernovsky, G.; Newburger, J. W., Neurodevelopmental status at eight years in children with dextro-transposition of the great arteries: the Boston circulatory arrest trial, J. Thorac. Cardiovasc. Surg., 126, 1385-1396, (2003)
[9] Breiman, L., Bagging predictors, Mach. Learn., 24, 123-140, (1996) · Zbl 0858.68080
[10] Breiman, L., Random forests, Mach. Learn., 45, 5-32, (2001) · Zbl 1007.68152
[11] Breiman, L., 2002. Setting up, using, and understanding Random Forests V4.0.
[12] Breiman, L.; Friedman, J.; Olshen, R.; Stone, C., Classification and regression trees, (1984), Wadsworth, Inc. California · Zbl 0541.62042
[13] Bühlmann, P.; Yu, B., Analyzing bagging, Ann. Statist., 30, 927-961, (2002) · Zbl 1029.62037
[14] Dahinden, C.; Kalisch, M.; Bühlmann, P., Decomposition and model selection for large contingency tables, Biometrical J., 7, 247-248, (2010) · Zbl 1207.62126
[15] Efron, B., Bootstrap methods: another look at the jackknife, Ann. Statist., 7, 1-26, (1979) · Zbl 0406.62024
[16] Friedman, J.; Hastie, T.; Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 432-441, (2008) · Zbl 1143.62076
[17] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33, 1-22, (2010)
[18] Givens, G. H.; Hoeting, J. A., Computational statistics, (2005), John Wiley & Sons, Inc. New Jersey · Zbl 1079.62001
[19] Graf, E., Rapport de méthodes. enquête suisse sur la santé 2007. plan d’échantillonnage, pondérations et analyses pondérées des données, (2010), Office Fédéral de la Statistique Neuchâtel
[20] Hapfelmeier, A.; Hothorn, T.; Ulm, K., Recursive partitioning on incomplete data using surrogate decisions and multiple imputation, Comput. Statist. Data Anal., 56, 1552-1565, (2012) · Zbl 1243.62092
[21] Hapfelmeier, A.; Ulm, K., A new variable selection approach using random forests, Comput. Statist. Data Anal., 60, 50-69, (2013) · Zbl 1365.62417
[22] Höfling, H.; Tibshirani, R., Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods, J. Mach. Learn. Res., 10, 883-906, (2009) · Zbl 1245.62121
[23] Hothorn, T.; Hornik, K.; Zeileis, A., Unbiased recursive partitioning: a conditional inference framework, J. Comput. Graph. Statist., 15, 651-674, (2006)
[24] Hövels-Gürich, H. H.; Bauer, S. B.; Schnitker, R.; Willmes-von Hinckeldey, K.; Messmer, B. J.; Seghaye, M. C.; Huber, W., Long-term outcome of speech and language in children after corrective surgery for cyanotic or acyanotic cardiac defects in infancy, Eur. J. Paediatr. Neuro., 12, 378-386, (2008)
[25] Hövels-Gürich, H. H.; Konrad, K.; Skorzenski, D.; Nacken, C.; Minkenberg, R.; Messmer, B. J.; Seghaye, M. C., Long-term neurodevelopmental outcome and exercise capacity after corrective surgery for tetralogy of fallot or ventricular septal defect in infancy, Ann. Thorac. Surg., 81, 958-966, (2006)
[26] Kalisch, M.; Bühlmann, P., Estimating high-dimensional directed acyclic graphs with the PC-algorithm, J. Mach. Learn. Res., 8, 613-636, (2007) · Zbl 1222.68229
[27] Kalisch, M.; Fellinghauer, B.; Grill, E.; Maathuis, M. H.; Mansmann, U.; Bühlmann, P.; Stucki, G., Understanding human functioning using graphical models, BMC Med. Res. Methodol., 10, 14, (2010)
[28] Kolenikov, S., Confirmatory factor analysis using confa, Stata J., 9, 329-373, (2009)
[29] Lauritzen, S. L., Graphical models, (1996), Oxford University Press Oxford · Zbl 0907.62001
[30] Lauritzen, S. L.; Spiegelhalter, D. J., Local computations with probabilities on graphical structures and their application to expert systems (with discussion), J. Roy. Stat. Soc. B Met., 50, 157-224, (1988) · Zbl 0684.68106
[31] Lauritzen, S. L.; Wermuth, N., Graphical models for associations between variables, some of which are qualitative and some quantitative, Ann. Statist., 17, 31-57, (1989) · Zbl 0669.62045
[32] Liaw, A.; Wiener, M., Classification and regression by random forest, R News, 2, 18-22, (2002)
[33] Lokhorst, J., 1999. The Lasso and Generalised Linear Models. Honors Project. The University of Adelaide, Australia.
[34] MacCallum, R. C.; Zhang, S.; Preacher, K. J.; Rucker, D. D., On the practice of dichotomization of quantitative variables, Psychol. Methods, 7, 19-40, (2002)
[35] Meier, L.; van de Geer, S.; Bühlmann, P., The group lasso for logistic regression, J. Roy. Stat. Soc. B Met., 70, 53-71, (2008) · Zbl 1400.62276
[36] Meinshausen, N.; Bühlmann, P., High-dimensional graphs and variable selection with the lasso, Ann. Statist., 34, 1436-1462, (2006) · Zbl 1113.62082
[37] Meinshausen, N.; Bühlmann, P., Stability selection (with discussion), J. Roy. Stat. Soc. B Met., 72, 417-473, (2010)
[38] Nicodemus, K. N.; Malley, J. D.; Strobl, C.; Ziegler, A., The bahaviour of random forest permutation-based variable importance measures under predictor correlation, BMC Bioinformatics, 11, (2010)
[39] Politis, D. N.; Romano, J. P.; Wolf, M., Subsampling, (1999), Springer Berlin · Zbl 0931.62035
[40] R Development Core Team, R: A language and environment for statistical computing, (2011), R Foundation for Statistical Computing Vienna, Austria
[41] Ravikumar, P.; Wainwright, M. J.; Lafferty, J. D., High-dimensional Ising model selection using \(\ell_1\)-regularized logistic regression, Ann. Statist., 38, 1287-1319, (2010) · Zbl 1189.62115
[42] Reinhardt, J. D.; Fellinghauer, B.; Strobl, R.; Stucki, G., Dimension reduction in human functioning and disability outcomes research: graphical models versus principal components analysis, Disabil. Rehabil., 32, 1000-1010, (2010)
[43] Reinhardt, J. D.; Mansmann, U.; Fellinghauer, B.; Strobl, R.; Grill, E.; von Elm, E.; Stucki, G., Functioning and disability in people living with spinal cord injury in high- and low-resourced countries: a comparative analysis of 14 countries, Int. J. Public Health, 56, 341-352, (2011)
[44] Royston, P.; Altman, D. G.; Sauerbrei, W., Dichotomizing continuous predictors in multiple regression: a bad idea, Stat. Med., 25, 127-141, (2006)
[45] Snookes, S. H.; Gunn, J. K.; Eldridge, B. J.; Donath, S. M.; Hunt, R. W.; Galea, M. P.; Shekerdemian, L., A systematic review of motor and cognitive outcomes after early surgery for congenital heart disease, Pediatrics, 125, 818-827, (2010)
[46] Stekhoven, D. J.; Bühlmann, P., Missforest — nonparametric missing value imputation for mixed-type data, Bioinformatics, 28, 112-118, (2012)
[47] Storni, M., Enquêtes, sources: enquête suisse sur la santé, (2011), Office Fédéral de la Statistique Neuchâtel
[48] Strobl, C.; Boulesteix, A. L.; Kneib, T.; Augustin, T.; Zeileis, A., Conditional variable importance for random forests, BMC Bioinformatics, 9, (2008)
[49] Strobl, C.; Boulesteix, A. L.; Zeileis, A.; Hothorn, T., Bias in random forest variable importance measures: illustrations, sources and a solution, BMC Bioinformatics, 8, (2007)
[50] Strobl, R.; Stucki, G.; Grill, E.; Müller, M.; Mansmann, U., Graphical models illustrated complex associations between variables describing human functioning, J. Clin. Epidemiol., 62, 922-933, (2009)
[51] Stucki, G.; Kostanjsek, N.; Üstün, B.; Cieza, A., ICF-based classification and measurement of functioning, Eur. J. Phys. Rehab. Med., 44, 315-328, (2008)
[52] Tibshirani, R., Regression shrinkage and selection via the lasso, J. Roy. Stat. Soc. B Met., 58, 267-288, (1996) · Zbl 0850.62538
[53] TNO, TNO-AZL pre-school children quality of life users manual, (2004), TNO-PG Leiden, Netherlands
[54] von Rhein, M.; Dimitropoulos, A.; Valsangiacomo Buechel, E. R.; Landolt, M. A.; Latal, B., Risk factors for neurodevelopmental impairments in school-age children after cardiac surgery with full-flow cardiopulmonary bypass, J. Thorac. Cardiov. Sur., 144, 577-583, (2012)
[55] Watson, N., Well, I know this is going to sound very strange to you, but I don’t see myself as a disabled person: identity and disability, Disabil. Soc., 17, 509-527, (2002)
[56] Whittaker, J., Graphical models in applied multivariate statistics, (1990), John Wiley & Sons, Inc. New Jersey · Zbl 0732.62056
[57] WHO, International classification of functioning, disability and health (ICF), (2001), WHO Press Geneva
[58] WHO and The World Bank, World report on disability, (2011), WHO Press Geneva
[59] Yu, H., 2010. Rmpi: Interface (Wrapper) to MPI (Message-Passing Interface).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.