GEE for longitudinal ordinal data: comparing R-geepack, R-multgee, R-repolr, SAS-GENMOD, SPSS-GENLIN. (English) Zbl 06984012

Summary: Studies in epidemiology and social sciences are often longitudinal and outcome measures are frequently obtained by questionnaires in ordinal scales. To understand the relationship between explanatory variables and outcome measures, generalized estimating equations can be applied to provide a population-averaged interpretation and address the correlation between outcome measures. It can be performed by different software packages, but a motivating example showed differences in the output. This paper investigated the performance of GEE in R (version 3.0.2), SAS (version 9.4), and SPSS (version 22.0.0) using simulated data under default settings. Multivariate logistic distributions were used in the simulation to generate correlated ordinal data. The simulation study demonstrated substantial bias in the parameter estimates and numerical issues for data sets with relative small number of subjects. The unstructured working association matrix requires larger numbers of subjects than the independence and exchangeable working association matrices to reduce the bias and diminish numerical issues. The coverage probabilities of the confidence intervals for fixed parameters were satisfactory for the independence and exchangeable working association matrix, but they were frequently liberal for the unstructured option. Based on the performance and the available options, SPSS and multgee, and repolr in R all perform quite well for relatively large sample sizes (e.g. 300 subjects), but multgee seems to do a little better than SPSS and repolr in most settings.


62-XX Statistics
Full Text: DOI Link


[1] Agresti, A., Analysis of ordinal categorical data, (2010), John Wiley & Sons · Zbl 1263.62007
[2] Agresti, A.; Natarajan, R., Modeling clustered ordered categorical data: a survey, Internat. Statist. Rev., 69, 345-371, (2001) · Zbl 1213.62106
[3] Armstrong, M.; Galli, A., Sequential Nongaussian simulations using the FGM copula, technical report, (2002)
[4] Carey, V.; Zeger, S.; Diggle, P., Modelling multivariate binary data with alternating logistic regressions, Biometrika, 80, 3, 517-526, (1993) · Zbl 0800.62446
[5] Chaganty, N.; Joe, H., Efficiency of generalized estimating equations for binary responses, J. R. Stat. Soc. Ser. B Stat. Methodol., 66, 4, 851-860, (2004) · Zbl 1059.62076
[6] Chaganty, N.; Joe, H., Range of correlation matrices for dependent Bernoulli random variables, Biometrika, 93, 1, 197-206, (2006) · Zbl 1152.62038
[7] Clayton, D., 1992, Repeated ordinal measurements: a generalised estimating equation approach.
[8] Deming, W.; Stephan, F., On a least squares adjustment of a sampled frequency table when the expected marginal totals are known, Ann. Math. Statist., 11, 427-444, (1940) · JFM 66.0652.02
[9] Goodman, L., The analysis of cross-classified data having ordered and/or unordered categories: association models, correlation models, and asymmetry models for contingency tables with or without missing entries, Ann. Statist., 13, 10-69, (1985) · Zbl 0613.62070
[10] Hardin, J.; Hilbe, J., Generalized estimating equations, (2012), Chapman and Hall, CRC Press
[11] Heagerty, P.; Zeger, S., Marginal regression models for clustered ordinal measurements, J. Amer. Statist. Assoc., 19, 435, 1024-1036, (1996) · Zbl 0882.62061
[12] Højsgaard, S.; Halekoh, U.; Yan, J., The \(r\) package geepack for generalized estimating equations, J. Statist. Softw., 15, 2, 1-11, (2005)
[13] Horton, N.; Lipsitz, S., Review of software to fit generalized estimating equation regression models, Amer. Statist, 53, 2, 160, (1999)
[14] Jennrich, R.; Sampson, P., Newton-raphson and related algorithms for maximum likelihood variance component estimation, Technometrics, 18, 1, 11-17, (1976) · Zbl 0322.62034
[15] Kenward, M.; Lesaffre, E.; Molenberghs, G., An application of maximum likelihood and generalized estimating equations to the analysis of ordinal data from a longitudinal study with cases missing at random, Biometrics, 50, 4, 945-953, (1994) · Zbl 0825.62797
[16] Kotz, S.; Balakrishnan, N.; Johnson, N., Continuous multivariate distributions, models and applications, (2000), John Wiley & Sons · Zbl 0946.62001
[17] Li, Y.; Schafer, D., Likelihood analysis of the multivariate ordinal probit regression model for repeated ordinal responses, Comput. Statist. Data Anal., 52, 7, 3474-3492, (2008) · Zbl 1452.62089
[18] Liang, K.; Zeger, S., Longitudinal data analysis using generalized linear models, Biometrika, 73, 1, 13-22, (1986) · Zbl 0595.62110
[19] Liang, K.; Zeger, S.; Qaqish, B., Multivariate regression analyses for categorical data, J. R. Stat. Soc. Ser. B Stat. Methodol., 54, 1, 3-40, (1992) · Zbl 0775.62172
[20] Lipsitz, S.; Fitzmaurice, G., Estimating equations for measures of association between repeated binary responses, Biometrics, 52, 3, 903-912, (1996) · Zbl 0875.62244
[21] Lipsitz, S.; Kim, K.; Zhao, L., Analysis of repeated categorical data using generalized estimating equations, Stat. Med., 13, 11, 1149-1163, (1994)
[22] Lipsitz, S.; Laird, N.; Harrington, D., Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association, Biometrika, 78, 1, 153-160, (1991)
[23] Liu, I.; Agresti, A., The analysis of ordered categorical data: an overview and a survey of recent developments, Test, 14, 1, 1-73, (2005) · Zbl 1069.62057
[24] Lumley, T., Generalized estimating equations for ordinal data: a note on working correlation structures, Biometrics, 52, 1, 354-361, (1996) · Zbl 0881.62117
[25] Mancl, L.; Leroux, B., Efficiency of regression estimates for clustered data, Biometrics, 52, 2, 500-511, (1996) · Zbl 0925.62303
[26] McCullacgh, P., Regression models for ordinal data, J. R. Stat. Soc. Ser. B Stat. Methodol., 42, 2, 109-142, (1980)
[27] Miller, M.; Davis, C.; Landis, J., The analysis of longitudinal polytomous data: generalized estimating equations and connections with weighted least squares, Biometrics, 49, 4, 1033-1044, (1993) · Zbl 0820.62093
[28] Molenberghs, G.; Kenward, M., Semi-parametric marginal models for hierarchical data and their corresponding full models, Comput. Statist. Data Anal., 54, 585-597, (2010) · Zbl 1464.62133
[29] Molenberghs, G.; Verbeke, G., Models for discrete longitudinal data, (2005), Springer-Verlag · Zbl 1093.62002
[30] Nelson, R., An introduction to copula, (2006), Springer-Verlag
[31] Nores, M.; del Pilar Díaz, M., Some properties of regression estimators in gee models for clustered ordinal data, Comput. Statist. Data Anal., 52, 7, 3877-3888, (2008) · Zbl 1452.62545
[32] Oster, R., An examination of statistical software packages for categorical data analysis using exact methods, Amer. Statist, 56, 3, 235-246, (2002)
[33] Oster, R., An examination of statistical software packages for categorical data analysis using exact methods—part ii, Amer. Statist, 57, 3, 201-213, (2003)
[34] Oster, R.; Hilbe, J., An examination of statistical software packages for parametric and nonparametric data analyses using exact methods, Amer. Statist, 62, 1, 74-84, (2008)
[35] Pan, W., Akaike’s information criterion in generalized estimating equations, Biometrics, 57, 1, 120-125, (2001) · Zbl 1210.62099
[36] Parsons, N., 2012, repolr: repeated measures proportional odds logistic regression. http://CRAN.R-project.org/package=repolr, r package version 1.0.
[37] Parsons, N.; Costa, M.; Achten, J.; Stallard, N., Repeated measures proportional odds logistic regression analysis of ordinal score data in the statistical software package r, Comput. Statist. Data Anal., 53, 3, 632-641, (2009) · Zbl 1452.62840
[38] Parsons, N.; Edmondson, R.; Gilmour, S., A generalized estimating equation method for Fitting autocorrelated ordinal score data with an application in horticultural research, J. R. Stat. Soc. Ser. C. Appl. Stat., 55, 5, 507-524, (2006) · Zbl 05188751
[39] Prentice, R., Correlated binary regression with covariates specific to each binary observation, Biometrics, 44, 4, 1033-1048, (1988) · Zbl 0715.62145
[40] Qu, Y.; Williams, G.; Beck, G.; Medendorp, S., Latent variable models for clustered dichotomous data with multiple subclusters, Biometrics, 48, 1095-1102, (1992)
[41] Stiger, T.; Barnhart, H. X.; Williamson, J., Testing proportionality in the proportional odds model fitted with gee, Stat. Med., 18, 11, 1419-1433, (1999)
[42] Stokes, M.; Davis, C.; Koch, G., Categorical data analysis using the SAS system, (2000), SAS Institute
[43] Touloumis, A., 2013, multgee: GEE solver for correlated nominal or ordinal multinomial responses, http://CRAN.R-project.org/package=multgee.
[44] Touloumis, A.; Agresti, A.; Kateri, M., Gee for multinomial responses using a local odds ratios parameterization, Biometrics, 69, 633-640, (2013) · Zbl 1429.62327
[45] Wang, Z.; Louis, T., Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function, Biometrika, 90, 4, 765-775, (2003) · Zbl 1436.62294
[46] Williamson, J.; Kim, K.; Lipsitz, S., Analyzing bivariate ordinal data using a global odds ratio, J. Amer. Statist. Assoc., 90, 432, 1432-1437, (1995) · Zbl 0868.62086
[47] Williamson, J.; Lipsitz, S.; Kim, K., Geecat and geegor: computer programs for the analysis of correlated categorical response data, Comput. Methods Programs Biomed., 58, 25-34, (1999)
[48] Yan, J., Enjoy the joy of copulas: with a package copula, J. Statist. Softw., 21, 4, 1-21, (2007)
[49] Yan, J., Højsgaard, S., Halekoh, U., 2012a, geepack: Generalized estimating equation package, http://CRAN.R-project.org/package=geepack, r package version 1.1-6.
[50] Yan, J., Kojadinovic, I., Hofert, M., Maechler, M., 2012b, copula: Multivariate dependence with copulas, http://CRAN.R-project.org/package=copula, r package version 0.99-1.
[51] Yu, K.; Yuan, W., Regression models for unbalanced longitudinal ordinal data: computer software and a simulation, Comput. Methods Programs Biomed., 75, 195-200, (2004)
[52] Zeger, S.; Liang, K., Longitudinal data analysis for discrete and continuous outcomes, Biometrics, 42, 1, 121-130, (1986)
[53] Ziegler, A., Generalized estimating equations, (2011), Springer-Verlag · Zbl 1291.62018
[54] Ziegler, A.; Gromping, U., The generalised estimating equations: a comparison of procedures available in commercial statistical software packages, Biom. J., 40, 3, 245-260, (1998) · Zbl 0967.62048
[55] Ziegler, A.; Kastner, C.; Blettner, M., The generalised estimating equations: an annotated bibliography, Biom. J., 40, 2, 115-139, (1998) · Zbl 0902.62083
[56] Zorn, C., Generalized estimating equation models for correlated data: a review with applications, Amer. J. Political Sci., 45, 2, 470-490, (2001)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.