Marginal analysis of longitudinal count data in long sequences: methods and applications to a driving study. (English) Zbl 1235.62037

Summary: Most of the available methods for longitudinal data analysis are designed and validated for the situation where the number of subjects is large and the number of observations per subject is relatively small. Motivated by the Naturalistic Teenage Driving Study (NTDS), which represents the exact opposite situation, we examine a standard and propose a new methodology for marginal analysis of longitudinal count data in a small number of very long sequences. We consider standard methods based on generalized estimating equations, under working independence or an appropriate correlation structure, and find them unsatisfactory for dealing with time-dependent covariates when the counts are low. For this situation, we explore a within-cluster resampling (WCR) approach that involves repeated analyses of random subsamples with a final analysis that synthesizes results across subsamples. This leads to a novel WCR method which operates on separated blocks within subjects and which performs better than all of the previously considered methods. The methods are applied to the NTDS data and evaluated in simulation experiments mimicking the NTDS.


62G05 Nonparametric estimation
65C60 Computational problems in statistics (MSC2010)
62P99 Applications of statistics
62H20 Measures of association (correlation, canonical correlation, etc.)
Full Text: DOI arXiv Euclid


[1] Albert, P. S. and McShane, L. M. (1995). A generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data. Biometrics 51 627-638. · Zbl 0825.62829 · doi:10.2307/2532950
[2] Chan, K. S. and Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving counts. J. Amer. Statist. Assoc. 90 242-252. · Zbl 0819.62069 · doi:10.2307/2291149
[3] Davis, R. A., Dunsmuir, W. T. M. and Wang, Y. (2000). On autocorrelation in a Poisson regression model. Biometrika 87 491-505. · Zbl 0956.62075 · doi:10.1093/biomet/87.3.491
[4] Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data , 2nd ed. Oxford Statistical Science Series 25 . Oxford Univ. Press, Oxford. · Zbl 1031.62002
[5] Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G. (2008). Longitudinal Data Analysis : A Handbook of Modern Statistical Methods . Chapman and Hall/CRC, New York.
[6] Follmann, D., Proschan, M. and Leifer, E. (2003). Multiple outputation: Inference for complex clustered data by averaging analyses from independent data. Biometrics 59 420-429. · Zbl 1210.62158 · doi:10.1111/1541-0420.00049
[7] Heagerty, P. J. and Lumley, T. (2000). Window subsampling of estimating functions with application to regression models. J. Amer. Statist. Assoc. 95 197-211. · Zbl 1013.62077 · doi:10.2307/2669538
[8] Hoffman, E. B., Sen, P. K. and Weinberg, C. R. (2001). Within-cluster resampling. Biometrika 88 1121-1134. · Zbl 0986.62047 · doi:10.1093/biomet/88.4.1121
[9] Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17 1217-1241. · Zbl 0684.62035 · doi:10.1214/aos/1176347265
[10] Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963-974. · Zbl 0512.62107 · doi:10.2307/2529876
[11] Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13-22. · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13
[12] Lipsitz, S. R., Laird, N. M. and Harrington, D. P. (1990). Using the jackknife to estimate the variance of regression estimators from repeated measures studies. Comm. Statist. Theory Methods 19 821-845.
[13] Mancl, L. A. and DeRouen, T. A. (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics 57 126-134. · Zbl 1209.62310 · doi:10.1111/j.0006-341X.2001.00126.x
[14] McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. J. Amer. Statist. Assoc. 92 162-170. · Zbl 0889.62061 · doi:10.2307/2291460
[15] McCulloch, C. E., Searle, S. R. and Neuhaus, J. M. (2008). Generalized , Linear , and Mixed Models , 2nd ed. Wiley, Hoboken, NJ. · Zbl 1165.62050
[16] Oman, S. D., Landsman, V., Carmel, Y. and Kadmon, R. (2007). Analyzing spatially distributed binary data using independent-block estimating equations. Biometrics 63 892-900. · Zbl 1146.62105 · doi:10.1111/j.1541-0420.2007.00754.x
[17] Paik, M. C. (1988). Repeated measurement analysis for nonnormal data in small samples. Communications in Statistics : Simulations 17 1155-1171. · Zbl 0695.62172 · doi:10.1080/03610918808812718
[18] Sherman, M. (1996). Variance estimation for statistics computed from spatial lattice data. J. Roy. Statist. Soc. Ser. B 58 509-523. · Zbl 0855.62082
[19] Simons-Morton, B. G., Ouimet, M. C., Zhang, Z., Lee, S. E., Klauer, S. E., Wang, J., Albert, P. S. and Dingus, T. A. (2011a). Risky driving and crash rates among novice teenagers and their parents. American Journal of Public Health 101 2362-2367.
[20] Simons-Morton, B. G., Ouimet, M. C., Zhang, Z., Lee, S. E., Klauer, S. E., Wang, J., Chen, R., Albert, P. S. and Dingus, T. A. (2011b). Naturalistic assessment of risky driving and crash/near crashes among novice teenagers: the effect of passengers. Journal of Adolescent Health 49 587-593.
[21] Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory of Brownian Motion. Phys. Rev. 36 823-841.
[22] Zeger, S. L. (1988). A regression model for time series of counts. Biometrika 75 621-629. · Zbl 0653.62064 · doi:10.1093/biomet/75.4.621
[23] Zeger, S. L. and Liang, K. Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42 121-130.
[24] Zeger, S. L., Liang, K.-Y. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44 1049-1060. · Zbl 0715.62136 · doi:10.2307/2531734
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.