Analysis of multivariate skew normal models with incomplete data. (English) Zbl 1175.62054

Summary: We establish computationally flexible methods and algorithms for the analysis of multivariate skew normal models when missing values occur in the data. To facilitate the computations and simplify the theoretic derivations, two auxiliary permutation matrices are incorporated into the model for the determination of observed and missing components of each observation. Under missing at random mechanisms, we formulate an analytically simple expectation conditional maximization (ECM) algorithm for calculating parameter estimation and retrieving each missing value with a single-valued imputation. Gibbs sampling is used to perform Bayesian inference on model parameters and to create multiple imputations for missing values. The proposed methodologies are illustrated through a real data set and comparisons are made with those obtained from fitting the normal counterparts.


62H12 Estimation in multivariate analysis
62F15 Bayesian inference
62H10 Multivariate distribution of statistics
65C60 Computational problems in statistics (MSC2010)


Full Text: DOI


[1] Beale, E.M.L.; Little, R.J.A., Missing values in multivariate analysis, J. roy. statist. soc. ser. B, 37, 129-146, (1975) · Zbl 0297.62014
[2] Anderson, T.W., Maximum likelihood estimates for a multivariate normal distribution when some observations are missing, J. amer. statist. assoc., 52, 200-203, (1957) · Zbl 0086.35304
[3] Hocking, R.R.; Smith, Wm.B., Estimation of parameters in the multivariate normal distribution with missing observations, J. amer. statist. assoc., 63, 159-173, (1968)
[4] Liu, C., Efficient ML estimation of the multivariate normal distribution from incomplete data, J. multivariate anal., 69, 206-217, (1999) · Zbl 1057.62519
[5] Gelman, A.; Meng, X.L., Applied Bayesian modeling and causal inference from incomplete-data perspectives, (2004), Wiley and Sons, Ltd UK
[6] Little, R.J.A.; Rubin, D.B., Statistical analysis with missing data, (2002), John Wiley and Sons New York
[7] Rubin, D.B., Multiple imputation for nonresponse in surveys, (1987), John Wiley and Sons New York · Zbl 1070.62007
[8] Schafer, J.L., Analysis of incomplete multivariate data, (1997), Chapman and Hall London · Zbl 0997.62510
[9] Box, G.E.P.; Cox, D.R., An analysis of transformation, J. roy. statist. soc. ser. A, 26, 211-252, (1964) · Zbl 0156.40104
[10] Azzalini, A.; Dalla Valle, A., The multivariate skew-normal distribution, Biometrika, 83, 715-726, (1996) · Zbl 0885.62062
[11] Azzalini, A.; Capitanio, A., Statistical applications of the multivariate skew-normal distribution, J. roy. statist. soc. ser. B, 61, 579-602, (1999) · Zbl 0924.62050
[12] Arellano-Valle, R.B.; Azzalini, A., On the unification of families of skew-normal distributions, Scand. J. statist., 33, 561-574, (2006) · Zbl 1117.62051
[13] Arellano-Vallea, R.B.; Genton, M.G., On fundamental skew distributions, J. multivariate anal., 96, 93-116, (2005) · Zbl 1073.62049
[14] Sahu, S.K.; Dey, D.K.; Branco, M.D., A new class of multivariate skew distributions with applications to Bayesian regression models, Canad. J. statist., 31, 129-150, (2003) · Zbl 1039.62047
[15] Ghosh, P.; Branco, M.D.; Chakraborty, H., Bivariate random effect model using skew-normal distribution with application to HIV-RNA, Statist. med., 26, 1255-1267, (2007)
[16] Rubin, D.B., Inference and missing data, Biometrika, 63, 581-592, (1976) · Zbl 0344.62034
[17] Meng, X.L.; Rubin, D.B., Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80, 267-278, (1993) · Zbl 0778.62022
[18] Nielsen, S.F., Inference and missing data: asymptotic result, Scand. J. statist., 24, 261-274, (1997) · Zbl 0888.62004
[19] Geman, S.; Geman, D., Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE trans. pattern anal. Mach. intell., 6, 721-741, (1984) · Zbl 0573.62030
[20] Gelfand, A.E.; Smith, A.F.M., Sampling based approaches to calculate marginal densities, J. amer. statist. assoc., 85, 398-409, (1990) · Zbl 0702.62020
[21] Chib, S.; Greenberg, E., Understanding the metropolis – hastings algorithm, Amer. statist., 49, 327-335, (1995)
[22] Gilks, W.R.; Richardson, S.; Spiegelhalter, D.J., Markov chain Monte Carlo in practice, (1996), Chapman and Hall London · Zbl 0832.00018
[23] Tierney, L., Markov chains for exploring posterior distributions, Ann. statist., 22, 1701-1728, (1994) · Zbl 0829.62080
[24] Tanner, M.A.; Wong, W.H., The calculation of posterior distributions by data augmentation (with discussion), J. amer. statist. assoc., 82, 528-550, (1987)
[25] Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B., Bayesian data analysis, (2004), Champan and Hall/CRC · Zbl 1039.62018
[26] Arellano-Valle, R.B.; Bolfarine, H.; Lachos, V.H., Bayesian inference for skew-normal linear mixed models, J. appl. stat., 34, 663-682, (2007)
[27] Lin, T.I., Maximum likelihood estimation for multivariate skew normal mixture models, J. multivariate anal., 100, 257-265, (2009) · Zbl 1152.62034
[28] Gupta, A.K.; González-Farías, G.; Domínguez-Molina, J.A., A multivariate skew normal distribution, J. multivariate anal., 89, 181-190, (2004) · Zbl 1036.62043
[29] Dempster, A.P.; Lard, N.M.; Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm (with discussion), J. roy. statist. soc. ser. B, 39, 1-38, (1977) · Zbl 0364.62022
[30] Styan, G.P.H., Hadamard products and multivariate statistical analysis, Linear algebra appl., 6, 217-240, (1973) · Zbl 0255.15002
[31] Zacks, S., The theory of statistical inference, (1971), Wiley and Sons New York
[32] Meilijson, I., A fast improvement to the EM algorithm to its own terms, J. roy. statist. soc. ser. B, 51, 127-138, (1989) · Zbl 0674.65118
[33] Edwards, W.H.; Lindman, H.; Savage, L.J., Bayesian statistical inference for psychological research, Psycol. rev., 70, 193-242, (1963) · Zbl 0173.22004
[34] Richardson, S.; Green, P.J., On Bayesian analysis of mixtures with an unknown number of components, J. roy. statist. soc. ser. B, 59, 731-792, (1997) · Zbl 0891.62020
[35] Stephens, M., Bayesian analysis of mixture models with an unknown number of components — an alternative to reversible jump methods, Ann. statist., 28, 40-74, (2000) · Zbl 1106.62316
[36] Brooks, S.P.; Gelman, A., General methods for monitoring convergence of iterative simulations, J. comput. graph. statist., 434-455, (1998)
[37] Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P.; Linde, A.V.D., Bayesian measures of model complexity and fit, J. roy. statist. soc. ser. B, 64, 583-639, (2002) · Zbl 1067.62010
[38] Rubin, D.B., Characterizing the estimation of parameters in incomplete-data problems, J. amer. statist. assoc., 69, 474-476, (1974) · Zbl 0291.62036
[39] Little, R.J.A., Regression with missing X’s: A review, J. amer. statist. assoc., 87, 1227-1237, (1992)
[40] Azzalini, A.; Genton, M.G., Robust likelihood methods based on the skew-\(t\) and related distributions, Internat. statist. rev., 76, 106-129, (2008) · Zbl 1206.62102
[41] Azzalini, A.; Capitanio, A., Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution, J. roy. statist. soc. ser. B, 65, 367-389, (2003) · Zbl 1065.62094
[42] Gupta, A.K., Multivariate skew \(t\)-distribution, Statistics, 37, 359-363, (2003) · Zbl 1037.62045
[43] Jones, M.C.; Faddy, M.J., A skew extension of the \(t\) distribution with applications, J. roy. statist. soc. ser. B, 159-174, (2003) · Zbl 1063.62013
[44] Anderson, T.W., An introduction to multivariate statistical analysis, (2003), John Wiley & Sons · Zbl 0083.14601
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.