An EM algorithm for estimating the parameters of the multivariate skew-normal distribution with censored responses. (English) Zbl 07579338

Summary: Limited or censored data are collected in many studies. This occurs for many reasons in several practical situations, such as limitations in measuring equipment or from an experimental design. Consequently, the true value is recorded only if it falls within an interval range so that the responses can be either left, interval, or right-censored. Missing values can be seen just as a particular case. Linear and nonlinear regression models are routinely used to analyze these types of data. Most of these models are based on the normality assumption for the error term. However, such analyses might not provide robust inference when the normality assumption (or symmetry) is questionable. The need for asymmetric distributions for the random errors motivates us to develop a likelihood-based inference for linear models with censored responses based on the multivariate skew-normal distribution, where the missing/censoring mechanism is assumed to be “missing at random” (MAR). The proposed EM-type algorithm for maximum likelihood estimation uses closed-form expressions at the E-step based on formulas for the mean and variance of a truncated multivariate skew-normal distribution, available in the R package MomTrunc. Three datasets with censored and/or missing observations are analyzed and discussed.


62-XX Statistics
Full Text: DOI


[1] Adcock, C.J., Shutes, K.: Portfolio selection based on the multivariate skew normal distribution. In: Skulimowski, A. (ed.) Financial modelling. Progress and Business Publishers, Krakow (2001) · Zbl 1425.62027
[2] Arellano-Valle, RB; Genton, MG, Multivariate extended skew-t distributions and related families, Metron, 68, 3, 201-234 (2010) · Zbl 1301.62016
[3] Arnold, B.C., Beaver, R.J.: Hidden truncation models. Sankhyā: Indian J. Stat. Series A (1961-2002) 62(1), 23-35 (2000) · Zbl 0973.62041
[4] Azzalini, A.: The R package sn: The Skew-Normal and Related Distributions such as the Skew-\(t\) (version 1.6-2). Università di Padova, Italia. http://azzalini.stat.unipd.it/SN (2020)
[5] Azzalini, A., Capitanio, A.: Statistical applications of the multivariate skew normal distribution. J. Royal Stat. Soc. Series B (Statistical Methodology) 61(3), 579-602 (1999) · Zbl 0924.62050
[6] Azzalini, A.; Dalla Valle, A., The multivariate skew-normal distribution, Biometrika, 83, 4, 715-726 (1996) · Zbl 0885.62062
[7] Cabral, CRB; Lachos, VH; Prates, MO, Multivariate mixture modeling using skew-normal independent distributions, Comput. Stat. Data Anal., 56, 1, 126-142 (2012) · Zbl 1239.62058
[8] Cao, J., Genton, M., Keyes, D., Turkiyyah, G.: tlrmvnmvt: Low-rank methods for MVN and MVT probabilities. R package version 1.1.0, (2020) https://CRAN.R-project.org/package=tlrmvnmvt
[9] Cao, J.; Genton, MG; Keyes, DE; Turkiyyah, GM, Exploiting low-rank covariance structures for computing high-dimensional normal and student-t probabilities, Stat. Comp, 31, 2, 1-16 (2021) · Zbl 1461.62014
[10] Capitanio, A.; Azzalini, A.; Stanghellini, E., Graphical models for skew-normal variates, Scand. J. Stat., 30, 1, 129-144 (2003) · Zbl 1035.60008
[11] De Alencar, F. H.C., Galarza, C. E., Matos, L. A. Lachos, V.H.: CensMFM: Finite mixture of multivariate censored/missing data. R package version 2.11, (2020) https://CRAN.R-project.org/package=CensMFM
[12] De Alencar, FHC; Galarza, CE; Matos, LA; Lachos, VH, Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution, Adv. Data Anal. Classif. (2021) · Zbl 07630551
[13] Dempster, A.; Laird, N.; Rubin, D., Maximum likelihood from incomplete data via the EM algorithm, J. Royal Stat. Soc. Series B, 39, 1-38 (1977) · Zbl 0364.62022
[14] Diggle, P.; Diggle, PJ; Heagerty, P.; Heagerty, PJ; Liang, K-Y; Zeger, S., Analysis of Longitudinal Data (2002), Oxford: Oxford University Press, Oxford · Zbl 1031.62002
[15] Galarza, C. E., Kan, R., Lachos, V. H.: MomTrunc: Moments of folded and doubly truncated multivariate distributions. R package version 5.69, (2020) https://CRAN.R-project.org/package=MomTrunc
[16] Galarza, CE; Lin, T-I; Wang, W-L; Lachos, VH, On moments of folded and truncated multivariate student-t distributions based on recurrence relations, Metrika, 84, 825-850 (2021) · Zbl 1475.62167
[17] Galarza, CE; Matos, LA; Dey, DK; Lachos, VH, On moments of folded and doubly truncated multivariate extended skew-normal distributions, J. Comput. Graph. Stat. (2021) · Zbl 07547624
[18] Garay, AM; Castro, LM; Leskow, J.; Lachos, VH, Censored linear regression models for irregularly observed longitudinal data using the multivariate-t distribution, Stat. Methods Med. Res., 26, 2, 542-566 (2017)
[19] Genz, A., Bretz, F.: Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics. Springer-Verlag, Heidelberg. ISBN 978-3-642-01688-2 (2009) · Zbl 1204.62088
[20] Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: multivariate normal and t distributions. R package version 1.0-12, https://CRAN.R-project.org/package=mvtnorm (2020)
[21] Hoffman, HJ; Johnson, RE, Pseudo-likelihood estimation of multivariate normal parameters in the presence of left-censored data, J. Agricul. Biol. Environm. Stat., 20, 1, 156-171 (2015) · Zbl 1325.62211
[22] Lachos, VH; Bolfarine, H.; Arellano-Valle, RB; Montenegro, LC, Likelihood based inference for multivariate skew-normal regression models, Commun. Stati. Theor. Methods, 36, 1769-1786 (2007) · Zbl 1124.62037
[23] Lin, TI, Maximum likelihood estimation for multivariate skew normal mixture models, J. Multivar. Anal., 100, 2, 257-265 (2009) · Zbl 1152.62034
[24] Lin, TI; Ho, HJ; Chen, CL, Analysis of multivariate skew normal models with incomplete data, J. Multivar. Anal., 100, 10, 2337-2351 (2009) · Zbl 1175.62054
[25] Little, RJA; Rubin, DB, Statistical Analysis With Missing Data (1987), New York: John Wiley & Sons, New York · Zbl 0665.62004
[26] Massuia, MB; Cabral, CRB; Matos, LA; Lachos, VH, Influence diagnostics for Student-t censored linear regression models, Statistics, 49, 5, 1074-1094 (2015) · Zbl 1382.62050
[27] Matos, LA; Prates, MO; Chen, MH; Lachos, VH, Likelihood-based inference for mixed-effects models with censored response using the multivariate-t distribution, Statistica Sinica, 23, 1323-1342 (2013) · Zbl 06202709
[28] McLachlan, G. J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, second edition (2008) · Zbl 1165.62019
[29] Meilijson, I., A fast improvement to the em algorithm on its own terms, J. Royal Stat. Soc. Series B (Methodological), 51, 1, 127-138 (1989) · Zbl 0674.65118
[30] Schwarz, G., Estimating the dimension of a model, Annal. Stat., 6, 2, 461-464 (1978) · Zbl 0379.62005
[31] VDEQ. The quality of Virginia non-tidal streams: First year report. Richmond, Virginia. Virginia Department of Environmental Quality (VDEQ). Technical Bulletin WQA/2002-001, United Stated of America (2003)
[32] Wang, P.; Li, D.; Sun, J., A pairwise pseudo-likelihood approach for left-truncated and interval-censored data under the Cox model, Biometrics (2020)
[33] Wang, W-L; Lin, T-I; Lachos, VH, Extending multivariate-t linear mixed models for multiple longitudinal data with censored responses and heavy tails, Stat. Methods Med. Res., 27, 1, 48-64 (2018)
[34] Wang, W-L; Castro, LM; Lachos, VH; Lin, T-I, Model-based clustering of censored data via mixtures of factor analyzers, Comput. Stat. Data Anal., 140, 104-121 (2019) · Zbl 1496.62109
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.