Simultaneous generation of multivariate mixed data with Poisson and normal marginals. (English) Zbl 1457.62013

Summary: The present paper develops a procedure for simulating multivariate data with count and continuous variables with a pre-specified correlation matrix. The count and continuous variables are assumed to have Poisson and normal marginals, respectively. The data generation mechanism is a combination of the normal to anything principle and a newly established connection between Poisson and normal correlations in the mixture. A step-by-step algorithm is provided and its performance is evaluated using two simulated and one real-data scenarios.


62-08 Computational methods for problems pertaining to statistics
Full Text: DOI


[1] Demirtas H, Doganay B. Simultaneous generation of binary and normal data with specified marginal and association structures. J Biopharm Stat. 2012;22(2):323-336. doi: 10.1080/10543406.2010.521874[Taylor & Francis Online], [Web of Science ®], [Google Scholar]
[2] Demirtas H, Hedeker D, Mermelstein J. Simulation of massive public health data by power polynomials. Stat Med. 2012;27(31):3337-3346. doi: 10.1002/sim.5362[Crossref], [Web of Science ®], [Google Scholar]
[3] Nelsen R. An introduction to copulas. Berlin: Springer; 2006. [Google Scholar] · Zbl 1152.62030
[4] Yahav I, Shmueli G. On generating multivariate Poisson data in management science applications. Appl Stoch Models Bus Ind. 2012;28(1):91-102. doi: 10.1002/asmb.901[Crossref], [Web of Science ®], [Google Scholar] · Zbl 06292433
[5] Demirtas H, Hedeker D. A practical way for computing approximate lower and upper correlation bounds. Am Stat. 2011;65(2):104-109. doi: 10.1198/tast.2011.10090[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1297.97031
[6] Hoeffding W. Scale-invariant correlation theory. In: Fisher NI, Sen PK, editors. The collected works of Wassily Hoeffding. New York: Springer-Verlag; 1994. p. 57-107. [Crossref], [Google Scholar]
[7] Fréchet M. Sur les tableaux de corrélation dont les marges sont données. Ann. l’Univ. Lyon Sect A. 1951;14:53-77. [Google Scholar] · Zbl 0045.22905
[8] Mardia K. Families of bivariate distributions. London: Griffin; 1970. [Google Scholar] · Zbl 0223.62062
[9] Krummenauer F. Efficient simulation of multivariate binomial and poisson distributions. Biom J. 1998;7(40): 823-832. doi: 10.1002/(SICI)1521-4036(199811)40:7<823::AID-BIMJ823>3.0.CO;2-S[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0914.62012
[10] Minhajuddin A, Harris I, Schucany W. Simulating multivariate distributions with specific correlations. J Stat Comput Simul. 2004;8(74):599-607. doi: 10.1080/00949650310001626161[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1153.65306
[11] Avramidis A, Channouf N, L’Ecuyer P. Efficient correlation matching for fitting discrete multivariate distributions with arbitrary marginals and normal-copula dependence. INFORMS J Comput. 2009;1(21):88-106. doi: 10.1287/ijoc.1080.0281[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1243.62081
[12] Whitt W. Bivariate distributions with given marginals. Ann Stat. 1976;4(6):1280-1289. doi: 10.1214/aos/1176343660[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0367.62022
[13] Higham NJ. Computing a nearest symmetric positive semidefinite matrix. Linear Algebra Appl. 1998;103:103-118. doi: 10.1016/0024-3795(88)90223-6[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0649.65026
[14] Higham NJ. Computing the nearest correlation matrix – a problem from finance. IMA J Numer Anal. 2002;22(3):329-343. doi: 10.1093/imanum/22.3.329[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1006.65036
[15] Schaefer J, Opgen-Rhein R, Zuber V, Ahdesmaki M, Silva APD. Strimmer K. R package corpcor: efficient estimation of covariance and (partial) correlation; 2013. Available from: http://www.cran.r-project.org/web/packages/corpcor[Google Scholar]
[16] Bates D, Maechler M. R package matrix: sparse and dense matrix classes and methods; 2013. Available from: http://www.cran.r-project.org/web/packages/Matrix[Google Scholar]
[17] Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Bornkamp B, Hothorn T. R package mvtnorm: multivariate normal and t distributions; 2014. Available from: http://www.cran.r-project.org/web/packages/mvtnorm[Google Scholar]
[18] Demirtas H. Simulation driven inferences for multiply imputed longitudinal datasets. Stat Neerlandica. 2004;58(4):466-482. doi: 10.1111/j.1467-9574.2004.00271.x[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1066.65020
[19] National longitudinal surveys: a program of the U.S. bureau of labor statistics. [cited 2013 Jun 10]. Available from: http://nlsinfo.org/[Google Scholar]
[20] Bates D, Maechler M. R package lme4: linear mixed-effects models using s4 classes; 2013. Available from: http://cran.r-project.org/web/packages/lme4/index.html[Google Scholar]
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.