zbMATH — the first resource for mathematics

Recursive partitioning for missing data imputation in the presence of interaction effects. (English) Zbl 06983893
Summary: Standard approaches to implement multiple imputation do not automatically incorporate nonlinear relations like interaction effects. This leads to biased parameter estimates when interactions are present in a dataset. With the aim of providing an imputation method which preserves interactions in the data automatically, the use of recursive partitioning as imputation method is examined. Three recursive partitioning techniques are implemented in the multiple imputation by chained equations framework. It is investigated, using simulated data, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. It is concluded that, when interaction effects are present in a dataset, substantial gains are possible by using recursive partitioning for imputation compared to standard applications. In addition, it is shown that the potential of recursive partitioning imputation approaches depends on the relevance of a possible interaction effect, the correlation structure of the data, and the type of possible interaction effect present in the data.

62 Statistics
Full Text: DOI
[1] Aiken, L. S.; West, S. G., Multiple regression: testing and interpreting interactions, (1991), Sage Newbury Park, CA
[2] Breiman, L., Random forest, Mach. Learn., 45, 5-32, (2001) · Zbl 1007.68152
[3] Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., Classification and regression trees, (1984), Wadsworth & Brooks/Cole Advanced Books & Software Monterey, CA · Zbl 0541.62042
[4] Burgette, L. F.; Reiter, K. P., Multiple imputation for missing data via sequential regression trees, Amer. J. Epidemiol., 172, 1070-1076, (2010)
[5] Cohen, J., Statistical power analysis for the behavioral sciences, (1988), Lawrence Erlbaum Associates Hillsdale, New Jersey · Zbl 0747.62110
[6] Collins, L. M.; Schafer, J. L.; Kam, C. M., A comparison of inclusive and restrictive strategies in modern missing data procedures, Psychol. Methods, 6, 330-351, (2001)
[7] Dusseldorp, E.; Conversano, C.; Van Os, B. J., Combining an additive and tree-based regression model simultaneously: STIMA, J. Comput. Graph. Statist., 19, 514-530, (2010)
[8] Friedman, J. H., Multivariate adaptive regression splines, Ann. Statist., 19, 1-67, (1991) · Zbl 0765.62064
[9] Graham, J. W.; Olchowski, A. E.; Gilreath, T. D., How many imputations are really needed? some practical clarifications of multiple imputation theory, Prevention Sci., 8, 206-213, (2007)
[10] Haddock, C.; Rindskopf, D.; Shadish, W., Using odds ratios as effect sizes for meta-analysis of dichotomous data: a primeur on methods and issues, Psychol. Methods, 3, 339-353, (1998)
[11] Hand, D. J., Construction and assessment of classification rules, (1997), Wiley Chichester · Zbl 0997.62500
[12] Hastie, T.; Tibshirani, R.; Friedman, J., The elements of statistical learning; data mining, inference, and prediction, (2001), Springer Verlag New York · Zbl 0973.62007
[13] Hothorn, T.; Hornik, K.; Zeileis, A., Unbiased recursive partitioning: a conditional inference framework, J. Comput. Graph. Statist., 15, 651-674, (2006)
[14] Hothorn, T., Zeileis, A., 2013. Partykit: a toolkit for recursive partitioning. R package version 0.1-6. URL: http://CRAN.R-project.org/package=partykit. · Zbl 1416.62257
[15] Iacus, S. M.; Porro, G., Missing data imputation, matching and other applications of random recursive partitioning, Comput. Statist. Data Anal., 52, 773-789, (2007) · Zbl 1452.62075
[16] Iacus, S. M.; Porro, G., Invariant and metric free proximities for data matching: an package, J. Stat. Softw., 25, 1-22, (2008)
[17] Kass, G. V., An exploratory technique for investigating large quantities of categorical data, J. R. Stat. Soc., 29, 119-127, (1980)
[18] Liaw, A.; Wiener, M., Classification and regression by randomforest, R News, 2, 18-22, (2002), URL: http://CRAN.R-project.org/package=randomForest
[19] Loh, W. Y., Regression trees with unbiased variable selection and interaction detection, Statist. Sinica, 12, 361-386, (2002) · Zbl 0998.62042
[20] Lubin, A., The interpretation of significant interaction, Educ. Psychol. Meas., 21, 807-817, (1961)
[21] Marshall, R. J.; Kitsantas, P., Stability and structure of cart and span search generated data partitions for the analysis of low birth weight, J. Data Sci., 10, 61-73, (2012)
[22] Merkle, E. C.; Schaffer, V. A., Binary recursive partitioning: background, methods, and application to psychology, British J. Math. Statist. Psych., 64, 161-181, (2011)
[23] Morgan, J. N.; Sonquist, J. A., Problems in the analysis of survey data, and a proposal, J. Amer. Statist. Assoc., 58, 415-434, (1963) · Zbl 0114.10103
[24] Nonyane, B. A.S.; Foulkes, A. S., Multiple imputation and random forests (mirf) for unobservable, high dimensional data, Int. J. Biostat., 3, (2007), Article 12 · Zbl 1136.62396
[25] Quinlan, J. R., C4.5: programs for machine learning, (1993), Morgan Kaufmann Publishers
[26] R: A language and environment for statistical computing, (2013), R Foundation for Statistical Computing Vienna, Austria, URL: http://www.R-project.org/
[27] Rubin, D. B., Multiple imputation after 18+ years, J. Am. Stat. Assoc., 91, 473-489, (1996) · Zbl 0869.62014
[28] Schepers, J.; Van Mechelen, I., A two-mode clustering method to capture the nature of the dominant interaction pattern in large profile data matrices, Psychol. Methods, 16, 361-371, (2011)
[29] Stekhoven, D. J.; Bühlmann, P., Missforest—non-parametric missing value imputation for mixed-type data, Bioinformatics, 1, 112-118, (2012)
[30] Strobl, C.; Malley, J.; Zeileis, A., An introduction to recursive partitioning: rationale, application and characteristics of classification and regression trees, bagging and random forests, Psychol. Methods, 14, 323-348, (2009)
[31] Therneau, T., Atkinson, B., Ripley, B., 2013. rpart: recursive partitioning. R package version 4.1-3. URL: http://CRAN.R-project.org/package=rpart.
[32] Van Buuren, S., Multiple imputation of discrete and continuous data by fully conditional specification, Stat. Methods Med. Res., 16, 219-242, (2007) · Zbl 1122.62382
[33] Van Buuren, S., Flexible imputation of missing data, (2012), Chapman & Hall/CRC Boca Raton, FL · Zbl 1256.62005
[34] Van Buuren, S.; Groothuis-Oudshoorn, K., MICE: multivariate imputation by chained equations in R, J. Stat. Softw., 45, 1-67, (2011), URL: http://CRAN.R-project.org/package=mice
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.