×

Additive mixed models with approximate Dirichlet process mixtures: the EM approach. (English) Zbl 1342.62106

Summary: We consider additive mixed models for longitudinal data with a nonlinear time trend. As random effects distribution an approximate Dirichlet process mixture is proposed that is based on the truncated version of the stick breaking presentation of the Dirichlet process and provides a Gaussian mixture with a data driven choice of the number of mixture components. The main advantage of the specification is its ability to identify clusters of subjects with a similar random effects structure. For the estimation of the trend curve the mixed model representation of penalized splines is used. An Expectation-Maximization algorithm is given that solves the estimation problem and that exhibits advantages over Markov chain Monte Carlo approaches, which are typically used when modeling with Dirichlet processes. The method is evaluated in a simulation study and applied to theophylline data and to body mass index profiles of children.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
62G05 Nonparametric estimation
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Arenz, S., Rckerl, R., Koletzko, B., von Kries, R.: Breast-feeding and childhood obesity—a systematic review. Int. J. Obesity Relat. Metab. Disord. 28, 1247-1256 (2004) · doi:10.1038/sj.ijo.0802758
[2] Beyerlein, A., Fahrmeir, L., Mansmann, U., Toschke, A.: Alternative regression models to assess increase in childhood BMI. BMC Med. Res. Methodol. 8(1), 59 (2008a)
[3] Beyerlein, A., Toschke, A., von Kries, R.: Breastfeeding and childhood obesity: Shift of the entire BMI distribution or only the upper parts? Obesity 16, 2730-2733 (2008b) · doi:10.1038/oby.2008.432
[4] Boeckmann, A.J., Sheiner, L.B., Beal, S.L.: NONNEM Users Guide: Part V. University of California, San Francisco (1994)
[5] Brezger, A., Kneib, T., Lang, S.: BayesX: analysing Bayesian structured additive regression models. J. Stat. Softw. 14(11), 1-22 (2005)
[6] Burkhardt, J.: ASA047: Nelder-Mead Minimization Algorithm. C++ Library (2008) · Zbl 0823.62007
[7] Chen, C.-M., Rzehak, P., Zutavern, A., Fahlbusch, B., Bischof, W., Herbarth, O., Borte, M., Lehmann, I., Behrendt, H., Krämer, U., Wichmann, H.-E., Heinrich, J.: Longitudinal study on cat allergen exposure and the development of allergy in young children. J. Allergy Clin. Immunol. 119, 1148-1155 (2007) · doi:10.1016/j.jaci.2007.02.017
[8] Davidian, M., Giltinan, D.M.: Nonlinear Models for Repeated Measurement Data. Chapman & Hall, London (1995)
[9] De Boor, C.: A Practical Guide to Splines. Springer, New York (1978) · Zbl 0406.41003 · doi:10.1007/978-1-4612-6333-3
[10] Diggle, P.J., Heagerly, P., Liang, K.Y., Zeger, S.L.: Analysis of Longitudinal Data, 2nd edn. Oxford University Press, Oxford (2002)
[11] Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1-26 (1979) · Zbl 0406.62024 · doi:10.1214/aos/1176344552
[12] Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 89-121 (1996) · Zbl 0955.62562 · doi:10.1214/ss/1038425655
[13] Eilers, P.H.C., Marx, B.D.: Splines, knots and penalties. Wiley Interdiscip. Rev. 2, 637-653 (2010) · doi:10.1002/wics.125
[14] Fahrmeir, L., Kneib, T., Lang, S.: Regression—Modelle, Methoden und Anwendungen. Springer, Berlin (2007) · Zbl 1153.62053
[15] Fan, J., Li, R.: New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Am. Stat. Assoc. 99, 710-723 (2004) · Zbl 1117.62329 · doi:10.1198/016214504000001060
[16] Fenske, N., Fahrmeir, L., Rzehak, P., Höhle, M.: Detection of risk factors for obesity in early childhood with quantile regression methods for longitudinal data. Technical Report 38, Ludwig-Maximilians-University Munich (2008) · Zbl 0229.65053
[17] Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209-230 (1973) · Zbl 0255.62037 · doi:10.1214/aos/1176342360
[18] Fitzmaurice, G.M., Laird, N.M., Ware, J.H.: Applied Longitudinal Analysis, 2nd edn. Wiley Series in Probability and Statistics. Wiley, New Jersey (2004) · Zbl 1057.62052
[19] Fritsch, A., Ickstadt, K.: Improved criteria for clustering based on the posterior similarity matrix. Bayesian Anal. 4, 367-392 (2009) · Zbl 1330.62249 · doi:10.1214/09-BA414
[20] Green, P.J.: Penalized likelihood for general semi-parametric regression models. Int. Stat. Rev. 55, 245-259 (1987) · Zbl 0636.62068 · doi:10.2307/1403404
[21] Harder, T., Bergmann, R., Kallischnigg, G., Plagemann, A.: Duration of breastfeeding and risk of overweight: a meta-analysis. Am. J. Epidemiol. 162, 397-403 (2005) · doi:10.1093/aje/kwi222
[22] Heinzl, F.: clustmixed: Clustering in linear and additive mixed models. R Package Version 0.1 (2012) · Zbl 1117.62440
[23] Heinzl, F., Tutz, G.: Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm. Stat. Model. 13, 41-67 (2013) · Zbl 07257449 · doi:10.1177/1471082X12471372
[24] Heinzl, F., Tutz, G.: Clustering in linear mixed models with a group fused lasso penalty. Biom. J. 56, 44-68 (2014) · Zbl 1280.62076 · doi:10.1002/bimj.201200111
[25] Heinzl, F., Fahrmeir, L., Kneib, T.: Additive mixed models with Dirichlet process mixture and P-spline priors. Adv. Stat. Anal. 96, 47-68 (2012) · Zbl 1443.62098 · doi:10.1007/s10182-011-0161-6
[26] Kimura, T., Tokuda, T., Nakada, Y., Nokajima, T., Matsumoto, T., Doucet, A.: Expectation-maximization algorithms for inference in Dirichlet processes mixture. Pattern Anal. Appl. 16, 55-67 (2013) · Zbl 1284.68500 · doi:10.1007/s10044-011-0256-4
[27] Koenker, R.: Quantile Regression. Economic Society Monographs. Cambridge University Press, Cambridge (2005) · Zbl 1111.62037 · doi:10.1017/CBO9780511754098
[28] Komárek, A., Lesaffre, E.: Generalized linear mixed model with a penalized Gaussian mixture as a random effects distribution. Comput. Stat. Data Anal. 52, 3441-3458 (2008) · Zbl 1452.62538 · doi:10.1016/j.csda.2007.10.024
[29] Laird, N.M., Ware, J.H.: Random-effects models for longitudinal data. Biometrics 38, 963-974 (1982) · Zbl 0512.62107 · doi:10.2307/2529876
[30] Li, Y., Lin, X., Müller, P.: Bayesian inference in semiparametric mixed models for longitudinal data. Biometrics 66, 70-78 (2010) · Zbl 1187.62057 · doi:10.1111/j.1541-0420.2009.01227.x
[31] Lindstrom, M.J., Bates, D.M.: Newton-Raphson and EM algorithms for linear mixed effects models for repeated measures data. J. Am. Stat. Assoc. 83, 1014-1022 (1988) · Zbl 0671.65119
[32] Mayr, A., Hothorn, T., Fenske, N.: Prediction intervals for future BMI values of individual children—a non-parametric approach by quantile boosting. BMC Medical Research Methodology 12, 6 (2012)
[33] McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman & Hall, New York (1989) · Zbl 0744.62098 · doi:10.1007/978-1-4899-3242-6
[34] McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997) · Zbl 0882.62012
[35] McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000) · Zbl 0963.62061 · doi:10.1002/0471721182
[36] Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7, 308-313 (1965) · Zbl 0229.65053 · doi:10.1093/comjnl/7.4.308
[37] Ohlssen, D.I., Sharples, L.D., Spiegelhalter, D.J.: Flexible random-effects models using Bayesian semi-parametric models: applications to institutional comparisons. Stat. Med. 26, 2088-2112 (2007) · doi:10.1002/sim.2666
[38] O’Neill, R.: Algorithms AS 47: function minimization using a simplex procedure. J. R. Stat. Soc. C 20, 338-345 (1971)
[39] O’Sullivan, F.: A statistical perspective on ill-posed inverse problems (with discussion). Stat. Sci. 1, 505-527 (1986) · Zbl 0625.62110
[40] Pinheiro, J.C., Bates, D.M.: Mixed-Effects Models in S and S-Plus. Springer, New York (2000) · Zbl 0953.62065 · doi:10.1007/978-1-4419-0318-1
[41] Reinsch, C.: Smoothing by spline functions. Numerische Mathematik 10, 177-183 (1967) · Zbl 0161.36203 · doi:10.1007/BF02162161
[42] Rigby, R., Stasinopoulos, D.: Generalized additive models for location, scale and shape. Appl. Stat. 54, 507-554 (2005) · Zbl 1490.62201
[43] Ruppert, D., Carroll, R.J.: Spatially-adaptive penalties for spline fitting. Aust. N. Z. J. Stat. 42, 205-223 (2000) · doi:10.1111/1467-842X.00119
[44] Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003) · Zbl 1038.62042 · doi:10.1017/CBO9780511755453
[45] Rzehak, P., Sausenthaler, S., Koletzko, S., Bauer, C.P., Schaaf, B., von Berg, A., Berdel, D., Borte, M., Herbarth, O., Krämer, U., Fenske, N., Wichmann, H.-E., Heinrich, J.: Period-specific growth, overweight and modification by breastfeeding in the GINI and LISA birth cohorts up to age 6 years. Eur. J. Epidemiol. 24, 449-467 (2009) · doi:10.1007/s10654-009-9356-5
[46] Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sinica 4, 639-650 (1994) · Zbl 0823.62007
[47] Verbeke, G., Lesaffre, E.: A linear mixed-effects model with heterogeneity in the random-effects population. J. Am. Stat. Assoc. 91, 217-221 (1996) · Zbl 0870.62057 · doi:10.1080/01621459.1996.10476679
[48] Verbeke, G., Molenberghs, G.: Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. Springer, New York (2000) · Zbl 0956.62055
[49] Verbyla, A.P., Cullis, B.R., Kenward, M.G., Welham, S.J.: The analysis of designed experiments and longitudinal data by using smoothing splines. J. R. Stat. Soc. C 48, 269-300 (1999) · Zbl 0956.62062 · doi:10.1111/1467-9876.00154
[50] Wang, N., Carroll, R.J., Lin, X.: Efficient semiparametric marginal estimation for longitudinal/clustered data. J. Am. Stat. Assoc. 100, 147-157 (2005) · Zbl 1117.62440 · doi:10.1198/016214504000000629
[51] Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall, London (2006) · Zbl 1087.62082
[52] Zeger, S.L., Diggle, P.J.: Semi-parametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 50, 689-699 (1994) · Zbl 0821.62093 · doi:10.2307/2532783
[53] Zhang, D., Lin, X., Raz, J., Sowers, M.F.: Semi-parametric stochastic mixed models for longitudinal data. J. Am. Stat. Assoc. 93, 710-719 (1998) · Zbl 0918.62039 · doi:10.1080/01621459.1998.10473723
[54] Zutavern, A., Rzehak, P., Brockow, I., Schaaf, B., Bollrath, C., von Berg, A., Link, E., Krämer, U., Borte, M., Herbarth, O., Wichmann, H.-E., Heinrich, J.: Day care in relation to respiratory-tract and gastrointestinal infections in a German birth cohort study. Acta Paediatrica 96, 1494-1499 (2007) · doi:10.1111/j.1651-2227.2007.00412.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.