Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution. (English) Zbl 07630551

Summary: Finite mixture models have been widely used to model and analyze data from a heterogeneous populations. Moreover, data of this kind can be missing or subject to some upper and/or lower detection limits because of the constraints of experimental apparatuses. Another complication arises when measures of each population depart significantly from normality, such as asymmetric behavior. For such data structures, we propose a robust model for censored and/or missing data based on finite mixtures of multivariate skew-normal distributions. This approach allows us to model data with great flexibility, accommodating multimodality and skewness, simultaneously, depending on the structure of the mixture components. We develop an analytically simple, yet efficient, EM-type algorithm for conducting maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the truncated multivariate skew-normal distributions. Furthermore, a general information-based method for approximating the asymptotic covariance matrix of the estimators is also presented. Results obtained from the analysis of both simulated and real datasets are reported to demonstrate the effectiveness of the proposed method. The proposed algorithm and method are implemented in the new R package CensMFM.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI arXiv


[1] Akaike, H., A new look at the statistical model identification, IEEE Trans Autom Cont, 19, 716-723 (1974) · Zbl 0314.62039
[2] Arellano-Valle, RB; Genton, MG, On fundamental skew distributions, J Multivar Anal, 96, 93-116 (2005) · Zbl 1073.62049
[3] Arellano-Valle RB, Genton MG (2010) Multivariate extended skew-t distributions and related families. Metron LXVIII:201-234 · Zbl 1301.62016
[4] Azzalini, A.; Capitanio, A., Statistical applications of the multivariate skew-normal distribution, J R Stat Soc B, 61, 579-602 (1999) · Zbl 0924.62050
[5] Azzalini, A.; Dalla-Valle, A., The multivariate skew-normal distribution, Biometrika, 83, 4, 715-726 (1996) · Zbl 0885.62062
[6] Bai, Z.; Krishnaiah, P.; Zhao, L., On rates of convergence of efficient detection criteria in signal processing with white noise, Inform Theory IEEE Trans, 35, 380-388 (1989) · Zbl 0677.94001
[7] Basford, K.; Greenway, D.; McLachlan, G.; Peel, D., Standard errors of fitted component means of normal mixtures, Comput Stat, 12, 1-18 (1997) · Zbl 0924.62055
[8] Basso, RM; Lachos, VH; Cabral, CRB; Ghosh, P., Robust mixture modeling based on scale mixtures of skew-normal distributions, Comput Stat Data Anal, 54, 12, 2926-2941 (2010) · Zbl 1284.62193
[9] Bouveyron, C.; Celeux, G.; Murphy, T.; Raftery, A., Model-based clustering and classification for data science: with applications in R (2019), Cambridge: Cambridge University Press, Cambridge · Zbl 1436.62006
[10] Browne, RP; McNicholas, PD, A mixture of generalized hyperbolic distributions, Can J Stat, 43, 2, 176-198 (2015) · Zbl 1320.62144
[11] Cabral, CRB; Lachos, VH; Prates, MO, Multivariate mixture modeling using skew-normal independent distributions, Comput Stat Data Anal, 56, 126-142 (2012) · Zbl 1239.62058
[12] Caudill, SB, A partially adaptive estimator for the censored regression model based on a mixture of normal distributions, Stat Methods Appl, 21, 121-137 (2012)
[13] Dempster, A.; Laird, N.; Rubin, D., Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc B, 39, 1-38 (1977) · Zbl 0364.62022
[14] Frühwirth-Schnatter, S., Finite mixture and Markov switching models (2006), Berlin: Springer, Berlin · Zbl 1108.62002
[15] Galarza, CE; Kan, R.; Lachos, VH, MomTrunc: moments of folded and doubly truncated multivariate distributions, R Package Vers, 5, 87 (2020)
[16] Galarza CE, Matos L, Lachos VH (2020b) Moments of the doubly truncated selection elliptical distributions with emphasis on the unified multivariate skew-\( t\) distribution. arXiv preprint arXiv:2007.14980 · Zbl 1520.62049
[17] He, J., Mixture model based multivariate statistical analysis of multiply censored environmental data, Adv Water Resour, 59, 15-24 (2013)
[18] Karlsson, M.; Laitila, T., Finite mixture modeling of censored regression models, Stat Pap, 55, 3, 627-642 (2014) · Zbl 1416.62215
[19] Lachos, VH; Bandyopadhyay, D.; Dey, DK, Linear and nonlinear mixed-effects models for censored HIV viral loads using normal/independent distributions, Biometrics, 67, 1594-1604 (2011) · Zbl 1274.62806
[20] Lachos, VH; Moreno, EJL; Chen, K.; Cabral, CRB, Finite mixture modeling of censored data using the multivariate Student-t distribution, J Multivar Anal, 159, 151-167 (2017) · Zbl 1397.62221
[21] Lachos, VH; Cabral, CRB; Zeller, CB, Finite mixture of Skewed distributions (2018), Berlin: Springer, Berlin · Zbl 1428.62006
[22] Lin, TI, Maximum likelihood estimation for multivariate skew normal mixture models, J Multivar Anal, 100, 2, 257-265 (2009) · Zbl 1152.62034
[23] Lin, TI; Ho, HJ; Chen, CL, Analysis of multivariate skew normal models with incomplete data, J Multivar Anal, 100, 19, 2337-2351 (2009) · Zbl 1175.62054
[24] Lin, TI; Lachos, VH; Wang, WL, Multivariate longitudinal data analysis with censored and intermittent missing responses, Stat Med, 37, 19, 2822-2835 (2018)
[25] Lin, TI; Wang, WL, Multivariate-t linear mixed models with censored responses, intermittent missing values and heavy tails, Stat Methods Med, 29, 5, 288-1304 (2020)
[26] Little, RJ; Rubin, DB, Statistical analysis with missing data (2002), Hoboken: Wiley, Hoboken · Zbl 1011.62004
[27] Louis, TA, Finding the observed information matrix when using the EM algorithm, J R Stat Soc B, 44, 226-233 (1982) · Zbl 0488.62018
[28] McLachlan, GJ; Krishnan, T., The EM algorithm and extensions (2008), Hoboken: Wiley, Hoboken · Zbl 1165.62019
[29] McLachlan, GJ; Peel, D., Finite mixture models (2000), New York: Wiley, New York · Zbl 0963.62061
[30] McNicholas, PD, Mixture model-based classification (2016), Boca Raton: Chapman and Hall/CRC, Boca Raton · Zbl 1454.62005
[31] Meilijson, I., A fast improvement to the em algorithm on its own terms, J R Stat Soc Ser B (Methodological), 51, 1, 127-138 (1989) · Zbl 0674.65118
[32] Peel, D.; McLachlan, GJ, Finite mixture models (2000), Hoboken: Wiley, Hoboken · Zbl 0963.62061
[33] Peel, D.; McLachlan, GJ, Robust mixture modelling using the t distribution, Stat Comput, 10, 4, 339-348 (2000)
[34] Prates, MO; Lachos, VH; Cabral, C., mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions, J Stat Softw, 54, 12, 1-20 (2013)
[35] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464 (1978) · Zbl 0379.62005
[36] Wang, WL; Castro, LM; Lachos, VH; Lin, TI, Model-based clustering of censored data via mixtures of factor analyzers, Comput Stat Data Anal, 140, 104-121 (2019) · Zbl 1496.62109
[37] Wang, WL; Liu, M.; Lin, TI, Robust skew-t factor analysis models for handling missing data, Stat Methods Appl, 26, 4, 649-672 (2017) · Zbl 1441.62161
[38] Zeller, CB; Cabral, CR; Lachos, VH, Robust mixture regression modeling based on scale mixtures of skew-normal distributions, Test, 25, 2, 375-396 (2016) · Zbl 1342.62113
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.