×

Regression tree-based diagnostics for linear multilevel models. (English) Zbl 07257468

Summary: Longitudinal and clustered data, where multiple observations for individuals are observed, require special models that reflect their hierarchical structure. The most commonly used such model is the linear multilevel model, which combines a linear model for the population-level fixed effects, a linear model for normally distributed individual-level random effects and normally distributed observation-level errors with constant variance. It has the advantage of simplicity of interpretation, but if the assumptions of the model do not hold inferences drawn can be misleading. In this paper, we discuss the use of regression trees that are designed for multilevel data to construct goodness-of-fit tests for this model that can be used to test for nonlinearity of the fixed effects or heteroscedasticity of the errors. Simulations show that the resultant tests are slightly conservative as 0.05 level tests, and have good power to identify explainable model violations (that is, ones that are related to available covariate information in the data). Application of the tests is illustrated on two real datasets.

MSC:

62-XX Statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Breiman, L, Friedman, JH, Olshen, RA, Stone, CJ (1984) Classification and regression trees. Monterey, CA: Wadsworth. · Zbl 0541.62042
[2] Calvin, JA, Sedransk, J (1991) Bayesian and frequentist predictive inference for the patterns of care studies. Journal of the American Statistical Association, 86, 36−48. · doi:10.1080/01621459.1991.10475002
[3] Davidian, M, Giltinan, DM (1995) Nonlinear models for repeated measurement data. London: Chapman and Hall.
[4] De’Ath, G (2002) Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology, 83, 1105−17. · doi:10.1890/0012-9658(2002)083[1105:MRTANT]2.0.CO;2
[5] Diggle, P, Liang, K-Y, Zeger, SL (1994) Analysis of longitudinal data. Oxford: Clarendon Press.
[6] Galimberti, G, Montanari, A (2002) Regression trees for longitudinal data with time-dependent covariates. In Jajuga, K, Sokolowski, A, Bock, H-H (eds) Classification, clustering and data analysis, pp. 391−98. New York: Springer. · doi:10.1007/978-3-642-56181-8_43
[7] Ghose, A, Ipeirotis, P, Sundararajan, A (2005) The dimensions of reputation in electronic markets. Technical Report 06-02, NYU CeDER Working Paper.
[8] Goldstein, H (2011) Multilevel statistical models, 4th edn. Chichester: Wiley. · Zbl 1274.62006
[9] Hajjem, A, Bellavance, F, Larocque, D (2011) Mixed effects regression trees for clustered data. Statistics and Probability Letters, 81, 451−59. · Zbl 1207.62136 · doi:10.1016/j.spl.2010.12.003
[10] Hothorn, T, Hornik, K, Zeileis, A (2006) Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15, 651−74. · doi:10.1198/106186006X133933
[11] Laird, NM, Ware, JH (1982) Random-effects models for longitudinal data. Biometrics, 38, 963−74. · Zbl 0512.62107 · doi:10.2307/2529876
[12] Lee, SK (2005) On generalized multivariate decision tree by using GEE. Computational Statistics and Data Analysis, 49, 1105−19. · Zbl 1429.62565 · doi:10.1016/j.csda.2004.07.003
[13] Lee, SK (2006) On classification and regression trees for multiple responses and its application. Journal of Classification, 23, 123−41. · doi:10.1007/s00357-006-0007-1
[14] Loh, W-Y (2002) Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12, 361−86. · Zbl 0998.62042
[15] Milborrow, S (2011) rpart.plot: Plot rpart models. R package version 1.2-2.
[16] Miller, TW (1996) Putting the CART after the horse: Tree-structured regression diagnostics. Proceedings of the American Statistical Association Statistical Computing Section, 150−55.
[17] Molenberghs, G, Verbeke, G (2011) A note on a hierarchical interpretation for negative variance components. Statistical Modelling, 11, 389−408. · Zbl 1420.62012
[18] Morgan, JN, Sonquist, JA (1963) Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58, 415−34. · Zbl 0114.10103 · doi:10.1080/01621459.1963.10500855
[19] Nobre, JS, Singer, JM (2007) Residual analysis for linear mixed models. Biometrical Journal, 49, 863−75. · Zbl 1442.62606 · doi:10.1002/bimj.200610341
[20] Pinheiro, JC, Bates, DM (2000) Mixed-effects models in S and S-PLUS. New York: Springer. · Zbl 0953.62065 · doi:10.1007/978-1-4419-0318-1
[21] R Development Core Team (2011) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org.
[22] Segal, MR (1992) Tree-structured models for longitudinal data. Journal of the American Statistical Association, 87, 407−18. · doi:10.1080/01621459.1992.10475220
[23] Sela, RJ, Simonoff, JS (2012a) RE-EM trees: A data mining approach for longitudinal and clustered data. Machine Learning, 86, 169−207. · Zbl 1238.68131 · doi:10.1007/s10994-011-5258-3
[24] Sela, RJ, Simonoff, JS (2012b) REEMtree: Regression trees with random effects for longitudinal (panel) data. R package version 0.90.3.
[25] Su, X, Tsai, C-L, Wang, MC (2009) Tree-structured model diagnostics for linear regression. Machine Learning, 74, 111−31. · Zbl 1200.68083 · doi:10.1007/s10994-008-5080-8
[26] Therneau, TM, Atkinson, B (2010) rpart: Recursive partitioning. R port by Brian Ripley. R package version 3.1−46.
[27] Verbeke, G, Molenberghs, G (2000) Linear mixed models for longitudinal data. New York: Springer. · Zbl 0956.62055
[28] Zhang, F, Weiss, RE (2000) Diagnosing explainable heterogeneity of variance in random-effects models. Canadian Journal of statistics, 28, 3−18. · Zbl 0961.62064 · doi:10.2307/3315878
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.