Analyzing incomplete discrete longitudinal clinical trial data. (English) Zbl 1426.62316

Summary: Commonly used methods to analyze incomplete longitudinal clinical trial data include complete case analysis (CC) and last observation carried forward (LOCF). However, such methods rest on strong assumptions, including missing completely at random (MCAR) for CC and unchanging profile after dropout for LOCF. Such assumptions are too strong to generally hold. Over the last decades, a number of full longitudinal data analysis methods have become available, such as the linear mixed model for Gaussian outcomes, that are valid under the much weaker missing at random (MAR) assumption. Such a method is useful, even if the scientific question is in terms of a single time point, for example, the last planned measurement occasion, and it is generally consistent with the intention-to-treat principle. The validity of such a method rests on the use of maximum likelihood, under which the missing data mechanism is ignorable as soon as it is MAR. In this paper, we will focus on non-Gaussian outcomes, such as binary, categorical or count data. This setting is less straightforward since there is no unambiguous counterpart to the linear mixed model. We first provide an overview of the various modeling frameworks for non-Gaussian longitudinal data, and subsequently focus on generalized linear mixed-effects models, on the one hand, of which the parameters can be estimated using full likelihood, and on generalized estimating equations, on the other hand, which is a nonlikelihood method and hence requires a modification to be valid under MAR. We briefly comment on the position of models that assume missingness not at random and argue they are most useful to perform sensitivity analysis. Our developments are underscored using data from two studies. While the case studies feature binary outcomes, the methodology applies equally well to other discrete-data settings, hence the qualifier “discrete” in the title.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62J12 Generalized linear models (logistic models)


Full Text: DOI arXiv Euclid


[1] Aerts, M., Geys, H., Molenberghs, G. and Ryan, L. M. (2002). Topics in Modelling of Clustered Data . Chapman and Hall, London. · Zbl 1084.62513
[2] Afifi, A. and Elashoff, R. (1966). Missing observations in multivariate statistics. I. Review of the literature. J. Amer. Statist. Assoc. 61 595–604. JSTOR:
[3] Agresti, A. (1990). Categorical Data Analysis . Wiley, New York. · Zbl 0716.62001
[4] Ashford, J. R. and Sowden, R. R. (1970). Multivariate probit analysis. Biometrics 26 535–546.
[5] Bahadur, R. R. (1961). A representation of the joint distribution of responses to \(n\) dichotomous items. In Studies in Item Analysis and Prediction (H. Solomon, ed.) 169–176. Stanford Univ. Press, Stanford, CA. · Zbl 0103.36702
[6] Bowman, D. and George, E. O. (1995). A saturated model for analyzing exchangeable binary data: Applications to clinical and developmental toxicity studies. J. Amer. Statist. Assoc. 90 871–879. · Zbl 0842.62088
[7] Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9–25. · Zbl 0775.62195
[8] Dale, J. R. (1986). Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics 42 909–917.
[9] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1–38. JSTOR: · Zbl 0364.62022
[10] Dempster, A. P. and Rubin, D. B. (1983). Overview. In Incomplete Data in Sample Surveys 2 . Theory and Bibliographies (W. G. Madow, I. Olkin and D. B. Rubin, eds.) 3–10. Academic Press, New York.
[11] Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data , 2nd ed. Oxford Univ. Press, New York. · Zbl 1031.62002
[12] Ekholm, A. (1991). Algorithms versus models for analyzing data that contain misclassification errors (with response). Biometrics 47 1171–1182.
[13] Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modelling Based on Generalized Linear Models . Springer, Heidelberg. · Zbl 0980.62052
[14] Fitzmaurice, G. M. (2003). Methods for handling dropouts in longitudinal clinical trials. Statist. Neerlandica 57 75–99. · Zbl 04575114
[15] Forster, J. J. and Smith, P. W. F. (1998). Model-based inference for categorical survey data subject to non-ignorable non-response. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 57–70. JSTOR: · Zbl 0910.62010
[16] Freeman, G. H. and Halton, J. H. (1951). Note on an exact treatment of contingency, goodness of fit and other problems of significance. Biometrika 38 141–149. JSTOR: · Zbl 0044.14702
[17] Geys, H., Molenberghs, M. and Lipsitz, S. R. (1998). A note on the comparison of pseudo-likelihood and generalized estimating equations for marginal odds ratio models with exchangeable association structure. J. Statist. Comput. Simulation 62 45–72. · Zbl 0940.62065
[18] Gilula, Z. and Haberman, S. J. (1994). Conditional log-linear models for analyzing categorical panel data. J. Amer. Statist. Assoc. 89 645–656. JSTOR: · Zbl 0801.62088
[19] Glonek, G. F. V. and McCullagh, P. (1995). Multivariate logistic models. J. Roy. Statist. Soc. Ser. B 55 533–546. · Zbl 0827.62059
[20] Hartley, H. O. and Hocking, R. R. (1971). The analysis of incomplete data. Biometrics 27 783–823.
[21] Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98 879–899. · Zbl 1047.62003
[22] Jansen, I., Molenberghs, G., Aerts, M., Thijs, H. and van Steen, K. (2003). A local influence approach applied to binary data from a psychiatric study. Biometrics 59 410–418. · Zbl 1210.62169
[23] Kenward, M. G., Goetghebeur, E. J. T. and Molenberghs, G. (2001). Sensitivity analysis for incomplete categorical data. Statistical Modelling 1 31–48. · Zbl 0983.62078
[24] Kenward, M. G. and Molenberghs, G. (1998). Likelihood based frequentist inference when data are missing at random. Statist. Sci. 13 236–247. · Zbl 1099.62503
[25] Lang, J. B. and Agresti, A. (1994). Simultaneously modeling joint and marginal distributions of multivariate categorical responses. J. Amer. Statist. Assoc. 89 625–632. · Zbl 0799.62063
[26] le Cessie, S. and van Houwelingen, J. C. (1994). Logistic regression for correlated binary data. Appl. Statist. 43 95–108. · Zbl 0825.62509
[27] Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22. JSTOR: · Zbl 0595.62110
[28] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data , 2nd ed. Wiley, New York. · Zbl 1011.62004
[29] Longford, N. (1993). Inference about variation in clustered binary data. Paper presented at the Multilevel Conference, Rand Corporation, Los Angeles.
[30] Mallinckrodt, C. H., Clark, W. S., Carroll, R. J. and Molenberghs, G. (2003a). Assessing response profiles from incomplete longitudinal clinical trial data under regulatory considerations. J. Biopharmaceutical Statistics 13 179–190. · Zbl 1180.62178
[31] Mallinckrodt, C. H., Clark, W. S. and David, S. R. (2001a). Type I error rates from mixed-effects model repeated measures versus fixed effects analysis of variance with missing values imputed via last observation carried forward. Drug Information J. 35 1215–1225.
[32] Mallinckrodt, C. H., Clark, W. S. and David, S. R. (2001b). Accounting for dropout bias using mixed-effects models. J. Biopharmaceutical Statistics 11 9–21.
[33] Mallinckrodt, C. H., Sanger, T. M., Dubé, S., DeBrota, D. J., Molenberghs, G., Carroll, R. J., Potter, W. Z. and Tollefson, G. D. (2003b). Assessing and interpreting treatment effects in longitudinal clinical trials with missing data. Biological Psychiatry 53 754–760.
[34] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models , 2nd ed. Chapman and Hall, London. · Zbl 0588.62104
[35] Molenberghs, G. and Lesaffre, E. (1994). Marginal modeling of correlated ordinal data using a multivariate Plackett distribution. J. Amer. Statist. Assoc. 89 633–644. · Zbl 0802.62063
[36] Molenberghs, G. and Lesaffre, E. (1999). Marginal modelling of multivariate categorical data. Statistics in Medicine 18 2237–2255.
[37] Molenberghs, G., Thijs, H., Jansen, I., Beunckens, C., Kenward, M. G., Mallinckrodt, C. and Carroll, R. J. (2004). Analyzing incomplete longitudinal clinical trial data. Biostatistics 5 445–464. · Zbl 1154.62398
[38] Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data . Springer, New York. · Zbl 1093.62002
[39] Neuhaus, J. M. (1992). Statistical methods for longitudinal and clustered designs with binary responses. Statistical Methods in Medical Research 1 249–273.
[40] Neuhaus, J. M., Kalbfleisch, J. D. and Hauck, W. W. (1991). A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Internat. Statist. Rev. 59 25–35.
[41] Pharmacological Therapy for Macular Degeneration Study Group (1997). Interferon \(\alpha\)-IIA is ineffective for patients with choroidal neovascularization secondary to age-related macular degeneration. Results of a prospective randomized placebo-controlled clinical trial. Archives of Ophthalmology 115 865–872.
[42] Plackett, R. L. (1965). A class of bivariate distributions. J. Amer. Statist. Assoc. 60 516–522. JSTOR:
[43] Raab, G. M. and Donnelly, C. A. (1999). Information on sexual behaviour when some data are missing. Appl. Statist. 48 117–133.
[44] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Amer. Statist. Assoc. 90 106–121. JSTOR: · Zbl 0818.62042
[45] Rotnitzky, A., Robins, J. M., and Scharfstein, D. O. (1998). Semiparametric regression for repeated outcomes with nonignorable nonresponse. J. Amer. Statist. Assoc. 93 1321–1339. JSTOR: · Zbl 1064.62520
[46] Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592. JSTOR: · Zbl 0344.62034
[47] Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys . Wiley, New York. · Zbl 1070.62007
[48] Rubin, D. B., Stern, H. S. and Vehovar, V. (1995). Handling “don’t know” survey responses: The case of the Slovenian plebiscite. J. Amer. Statist. Assoc. 90 822–828.
[49] Schafer, J. (2003). Multiple imputation in multivariate problems when the imputation and analysis models differ. Statist. Neerlandica 57 19–35. · Zbl 04575109
[50] Siddiqui, O. and Ali, M. W. (1998). A comparison of the random-effects pattern mixture model with last-observation-carried-forward (LOCF) analysis in longitudinal clinical trials with dropouts. J. Biopharmaceutical Statistics 8 545–563. · Zbl 0937.62116
[51] Stiratelli, R., Laird, N. and Ware, J. H. (1984). Random effects models for serial observations with binary response. Biometrics 40 961–971.
[52] van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality . Springer, New York. · Zbl 1013.62034
[53] van Steen, K., Molenberghs, G., Verbeke, G. and Thijs, H. (2001). A local influence approach to sensitivity analysis of incomplete longitudinal ordinal data. Statistical Modelling 1 125–142. · Zbl 1022.62062
[54] Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data . Springer, New York. · Zbl 0956.62055
[55] Wolfinger, R. and O’Connell, M. (1993). Generalized linear mixed models: A pseudo-likelihood approach. J. Statist. Comput. Simulation 48 233–243. · Zbl 0833.62067
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.