×

zbMATH — the first resource for mathematics

Gaussian copula marginal regression. (English) Zbl 1336.62152
Summary: This paper identifies and develops the class of Gaussian copula models for marginal regression analysis of non-normal dependent observations. The class provides a natural extension of traditional linear regression models with normal correlated errors. Any kind of continuous, discrete and categorical responses is allowed. Dependence is conveniently modelled in terms of multivariate normal errors. Inference is performed through a likelihood approach. While the likelihood function is available in closed-form for continuous responses, in the non-continuous setting numerical approximations are used. Residual analysis and a specification test are suggested for validating the adequacy of the assumed multivariate model. Methodology is implemented in a R package called gcmr. Illustrations include simulations and real data applications regarding time series, cross-design data, longitudinal studies, survival analysis and spatial regression.

MSC:
62H20 Measures of association (correlation, canonical correlation, etc.)
62J12 Generalized linear models (logistic models)
65C60 Computational problems in statistics (MSC2010)
62H12 Estimation in multivariate analysis
PDF BibTeX XML Cite
Full Text: DOI Euclid
References:
[1] Anděl, J., Netuka, I. and Svara, K. (1984). On threshold autoregressive processes., Kybernetika 20 , 89-106. · Zbl 0547.62058 · eudml:27648
[2] Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation., Econometrica 59 , 817-858. · Zbl 0732.62052 · doi:10.2307/2938229
[3] Azzalini, A. (1985). A class of distributions which includes the normal ones., Scandinavian Journal of Statistics 12 , 171-178. · Zbl 0581.62014
[4] Bodnar, O. Bodnar, T., and Gupta, A.K. (2010). Estimation and inference for dependence in multivariate data., Journal of Multivariate Analysis 101 , 869-881. · Zbl 1181.62081 · doi:10.1016/j.jmva.2009.11.005
[5] Booth, J.G. and Hobert, J.P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm., Journal of the Royal Statistical Society, Series B 61 , 265-285. · Zbl 0917.62058 · doi:10.1111/1467-9868.00176
[6] Chib, S. (1995). Marginal likelihood from the Gibbs output., Journal of the American Statistical Association 90 , 1313-1321. · Zbl 0868.62027 · doi:10.2307/2291521
[7] Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models., Biometrika 85 , 347-361. · Zbl 0938.62020 · doi:10.1093/biomet/85.2.347
[8] Cox, D.R., and Snell, E.J. (1968). A general definition of residuals., Journal of the Royal Statistical Society, Series B 30 , 248-275. · Zbl 0164.48903
[9] Craig, P. (2008). A new reconstruction of multivariate normal orthant probabilities., Journal of the Royal Statistical Society, Series B 70 , 227-243. · Zbl 05563352
[10] Cressie, N. (1993)., Statistics for Spatial Data . Wiley, New York. · Zbl 0799.62002
[11] de Leon, A.R. and Wu, B. (2011). Copula-based regression models for a bivariate mixed discrete and continuous outcome., Statistics in Medicine 30 , 175-185. · doi:10.1002/sim.4087
[12] de Leon, A.R., Wu, B., and Withanage, N. (2012). Joint analysis of mixed discrete and continuous outcomes via copula models. Preprint, . · math.ucalgary.ca
[13] Diggle, P.J., Heagerty, P., Liang, K.-Y. and Zeger, S.L. (2002)., Analysis of longitudinal data . Second edition. Oxford University Press, Oxford. · Zbl 1031.62002
[14] Diggle, P.J. and Ribeiro, P.J.J. (2007)., Model-based Geostatistics . Springer, New York. · Zbl 1132.86002
[15] Dunn, P.K. and Smyth, G.K. (1996). Randomized quantile residuals., Journal of Computational and Graphical Statistics 5 , 236-244.
[16] Durbin, J. and Koopman, S.J. (2001)., Time Series Analysis by State Space Methods . Oxford University Press. · Zbl 0995.62504
[17] Genest, C. and Nešlehová, J. (2007). A primer on copulas for count data., Astin Bulletin 37 , 475-515. · Zbl 1274.62398 · doi:10.2143/AST.37.2.2024077
[18] Genz, A. and Bretz, F. (2002). Methods for the computation of multivariate t-probabilities., Journal of Computational and Graphical Statistics 11 , 950-971. · doi:10.1198/106186002394
[19] Geweke, J. (1991). Efficient simulation from the multivariate normal and Student-t distributions subject to linear constraints. In Proceedings of the 23rd Symposium in the Interface, Interface Foundation of North America, Fairfax.
[20] Gueorguieva, R.V. and Agresti, A. (2001). A correlated probit model for joint modelling of clustered binary and continuous responses., Journal of the American Statistical Association 96 , 1102-1112. · Zbl 1072.62612 · doi:10.1198/016214501753208762
[21] Harris, B. (1988). Tetrachoric correlation coefficient, in L. Kotz and N. Johnson (eds.) Encyclopedia of Statistical Sciences 9 , 223-225. Wiley.
[22] Hausman, J.A. (1978). Specification tests in econometrics., Econometrica , 46 , 1251-1271. · Zbl 0397.62043 · doi:10.2307/1913827
[23] Hoff, P.D. (2007). Extending the rank likelihood for semiparametric copula estimation., Annals of Applied Statistics 1 , 265-283. · Zbl 1129.62050 · doi:10.1214/07-AOAS107 · euclid:aoas/1183143739
[24] Hothorn, A., Bertz, F., and Genz, A. (2001). On multivariate T and Gaussian probabilities in R., R News 1 , 27-29.
[25] Hurvich, C.M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples., Biometrika 76 , 297-307. · Zbl 0669.62085 · doi:10.1093/biomet/76.2.297
[26] Jeliazkov, I. and Lee, E.H. (2010). MCMC perspectives on simulated likelihood estimation., Advances in Econometrics 26 , 3-40.
[27] Joe, H. (1995). Approximation to multivariate normal rectangle probabilities based on conditional expectations., Journal of the American Statistical Association 90 , 957-964. · Zbl 0843.62016 · doi:10.2307/2291331
[28] Joe, H. (1997)., Multivariate Models and Dependence Concepts . Chapman and Hall. · Zbl 0990.62517
[29] Kauermann, G. and Carroll, R.J. (2001). A note on the efficiency of sandwich covariance matrix estimation., Journal of the American Statistical Association 96 , 1387-1396. · Zbl 1073.62539 · doi:10.1198/016214501753382309
[30] Keane, M.P. (1994). A computationally practical simulation estimator for panel data., Econometrica 62 , 95-116. · Zbl 0788.62100 · doi:10.2307/2951477
[31] Klaassen, C.A. and Wellner, J.A. (1997). Efficient estimation in the bivariate normal copula model: normal margins are least favourable., Bernoulli 3 , 55-77. · Zbl 0877.62055 · doi:10.2307/3318652
[32] Kugiumtzis, D. and Bora-Senta, E. (2010). Normal correlation coefficient of non-normal variables using piece-wise linear approximation., Computational Statistics 25 , 645-662. · Zbl 1226.62056 · doi:10.1007/s00180-010-0195-3
[33] Le Cessie, S. and Van Houwelingen, J.C. (1994). Logistic regression for correlated binary data., Applied Statistics 43 , 95-108. · Zbl 0825.62509 · doi:10.2307/2986114
[34] Liang, K.-L. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models., Biometrika 73 , 13-22. · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13
[35] Lindsay, B.G. (1988). Composite likelihood methods., Contemporary Mathematics 80 , 221-240. · Zbl 0672.62069 · doi:10.1090/conm/080/999014
[36] Mantel, N., Bohidar, N.R. and Ciminera, J.L. (1977). Mantel-Haenszel analysis of litter-matched time-to-response data, with modifications to recovery of interlitter information., Cancer Research 37 , 3863-3868.
[37] McCullagh, P. and Nelder, J.A. (1989)., Generalized Linear Models . Second edition. Chapman and Hall. · Zbl 0744.62098
[38] Miwa, T., Hayter, A.J. and Kuriky, S. (2003). The evaluation of general non-centred orthant probabilities., Journal of the Royal Statistical Society, Series B 65 , 223-234. · Zbl 1063.62082 · doi:10.1111/1467-9868.00382
[39] Molenberghs, G. and Verbeke, G. (2005)., Models for Discrete Longitudinal Data , Springer. · Zbl 1093.62002 · doi:10.1007/0-387-28980-1
[40] Nikoloulopoulos, A.K., Joe, H. and Li, H. (2011). Weighted scores method for regression models with dependent data., Biostatistics 12 , 653-665. · Zbl 1314.62246
[41] Nikoloulopoulos, A.K., Joe, H. and Chaganty, N.R. (2011). Extreme value properties of multivariate t copulas., Extremes 12 , 129-148. · Zbl 1223.62081 · doi:10.1007/s10687-008-0072-4
[42] Parzen, M., Ghosh, S., Lipsitz, S., Sinha, D., Fitzmaurice, G.M., Mallick, B.K., Ibrahim, J.G. (2011). A generalized linear mixed model for longitudinal binary data with a marginal logit link function., Annals of Applied Statistics 5 , 449-467. · Zbl 1220.62093 · doi:10.1214/10-AOAS390
[43] Pitt, M., Chan, D. and Kohn, R. (2006). Efficient Bayesian inference for Gaussian copula regression models., Biometrika 93 , 537-554. · Zbl 1108.62027 · doi:10.1093/biomet/93.3.537
[44] R Development Core Team (2012)., R: A language and environment for statistical computing . R Foundation for Statistical Computing, Vienna, Austria. URL: . · www.R-project.org
[45] Rosenblatt, M. (1952). Remarks on a multivariate transformation., The Annals of Mathematical Statistics 23 , 470-472. · Zbl 0047.13104 · doi:10.1214/aoms/1177729394
[46] Rue, H. and Tjelmeland, H. (2002). Fitting Gaussian random fields to Gaussian fields., Scandinavian Journal of Statistics 29 , 31-50. · Zbl 1017.62088 · doi:10.1111/1467-9469.00058
[47] Song, P.X.-K. (2000). Multivariate dispersion models generated from Gaussian copula., Scandinavian Journal of Statistics 27 , 305-320. · Zbl 0955.62054 · doi:10.1111/1467-9469.00191
[48] Song, P.X-K. (2007)., Correlated Data Analysis: Modeling, Analytics and Applications . Springer-Verlag. · Zbl 1132.62002
[49] Song, P.X.-K., Fan, Y. and Kalbfleisch, J.D. (2005). Maximization by parts in likelihood inference (with discussion)., Journal of the American Statistical Association 100 , 1145-1167. · Zbl 1117.62429 · doi:10.1198/016214505000000204 · miranda.asa.catchword.org
[50] Song, P.X.-K., Li, M. and Yuan, Y. (2009). Joint regression analysis of correlated data using Gaussian copulas., Biometrics 65 , 60-68. · Zbl 1159.62049 · doi:10.1111/j.1541-0420.2008.01058.x
[51] Sung, Y.J. and Geyer, C.J. (2007). Monte Carlo likelihood inference for missing data models., The Annals of Statistics 35 , 990-1011. · Zbl 1124.62009 · doi:10.1214/009053606000001389
[52] Tong, H. (1990)., Non-Linear Time Series: A Dynamical System Approach . Oxford: Oxford University Press. · Zbl 0716.62085
[53] Train, K.E. (2003)., Discrete Choice Methods with Simulation . Cambridge: Cambridge University Press. · Zbl 1047.62098
[54] Varin, C., Reid, N. and Firth, D. (2011). An overview of composite likelihood methods., Statistica Sinica 21 , 5-42. · Zbl 05849508
[55] Waller, L.A. and Gotway, C.A. (2004)., Applied Spatial Statistics for Public Health Data . New York: John Wiley and Sons. · Zbl 1057.62106 · doi:10.1002/0471662682
[56] Wakefield, J. (2007). Disease mapping and spatial regression with count data., Biostatistics 8 , 158-183. · Zbl 1213.62178 · doi:10.1093/biostatistics/kxl008
[57] White, H. (1994)., Estimation, Inference and Specification Analysis . Cambridge University Press. · Zbl 0860.62100
[58] Wu, B. and de Leon, A.R. (2012). Flexible random effects copula models for clustered mixed bivariate outcomes in developmental toxicology. Preprint, . · math.ucalgary.ca
[59] Zeger, S.L. (1988). A regression model for time series of counts., Biometrika 75 , 822-835. · Zbl 0653.62064 · doi:10.1093/biomet/75.4.621
[60] Zeger, S.L. and Karim, M.R. (1991). Generalized linear models with random effects: a Gibbs sampling approach., Journal of the American Statistical Association 86 , 79-86. · doi:10.1080/01621459.1991.10475006
[61] Zeileis, A. (2006). Object-oriented computation of sandwich estimators., Journal of Statistical Software 16 , issue 9. · Zbl 1445.62316
[62] Zhao, Y. and Joe, H. (2005). Composite likelihood estimation in multivariate data analysis., The Canadian Journal of Statistics 33 , 335-356. · Zbl 1077.62045 · doi:10.1002/cjs.5540330303
[63] Zucchini, W. and MacDonald, I.L. (2009)., Hidden Markov Models for Time Series . Chapman & Hall/CRC. · Zbl 1180.62130
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.