×

Finite mixture of regression models for censored data based on scale mixtures of normal distributions. (English) Zbl 1474.62259

Summary: In statistical analysis, particularly in econometrics, the finite mixture of regression models based on the normality assumption is routinely used to analyze censored data. In this work, an extension of this model is proposed by considering scale mixtures of normal distributions (SMN). This approach allows us to model data with great flexibility, accommodating multimodality and heavy tails at the same time. The main virtue of considering the finite mixture of regression models for censored data under the SMN class is that this class of models has a nice hierarchical representation which allows easy implementation of inferences. We develop a simple EM-type algorithm to perform maximum likelihood inference of the parameters in the proposed model. To examine the performance of the proposed method, we present some simulation studies and analyze a real dataset. The proposed algorithm and methods are implemented in the new R package CensMixReg.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J05 Linear regression; mixed models
62N01 Censored data models
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99-102 · Zbl 0282.62017
[2] Arellano-Valle RB, Castro L, González-Farías G, Muños Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453-473 · Zbl 1332.62381 · doi:10.1007/s10260-012-0199-y
[3] Ateya SF (2014) Maximum likelihood estimation under a finite mixture of generalized exponential distributions based on censored data. Stat Pap 55:311-325 · Zbl 1297.62040 · doi:10.1007/s00362-012-0480-z
[4] Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926-2941 · Zbl 1284.62193 · doi:10.1016/j.csda.2009.09.031
[5] Benites L, Lachos VH, Moreno EJL (2017) CensMixReg: censored linear mixture regression models. https://CRAN.R-project.org/package=CensMixReg, R package version 3.0
[6] Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126-142 · Zbl 1239.62058 · doi:10.1016/j.csda.2011.06.026
[7] Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normal distributions. Stat Methods Appl 21:121-137 · doi:10.1007/s10260-011-0182-z
[8] Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \[k\] k-means: an attempt to robustify quantizers. Annal Stat 25:553-576 · Zbl 0878.62045 · doi:10.1214/aos/1031833664
[9] Depraetere N, Vandebroek M (2014) Order selection in finite mixtures of linear regressions: literature review and a simulation study. Stat Pap 55:871-911 · Zbl 1334.62138 · doi:10.1007/s00362-013-0534-x
[10] Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1-38 · Zbl 0364.62022
[11] Fagundes RA, de Souza RM, Cysneiros FJA (2013) Robust regression with application to symbolic interval data. Eng Appl Artif Intell 26:564-573 · doi:10.1016/j.engappai.2012.05.004
[12] Faria S, Soromenho G (2010) Fitting mixtures of linear regressions. J Stat Comput Simul 80(2):201-225 · Zbl 1184.62118 · doi:10.1080/00949650802590261
[13] Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York · Zbl 1108.62002
[14] Galimberti G, Soffritti G (2014) A multivariate linear regression analysis using finite mixtures of t distributions. Comput Stat Data Anal 71:138-150 · Zbl 1471.62070 · doi:10.1016/j.csda.2013.01.017
[15] Garay AM, Lachos VH, Bolfarine H, Cabral CRB (2015) Linear censored regression models with scale mixtures of normal distributions. Stat Pap 58:247-278 · Zbl 1394.62131 · doi:10.1007/s00362-015-0696-9
[16] Garay AM, Lachos VH, Lin TI (2016) Nonlinear censored regression models with heavy-tailed distributions. Stat Interface 9:281-293 · Zbl 1405.62094 · doi:10.4310/SII.2016.v9.n3.a3
[17] Greene WH (2012) Econometric analysis, 7th edn. Pearson, Harlow
[18] Grün B, Leisch F (2008) Finite mixtures of generalized linear regression models. In: Recent advances in linear models and related areas: essays in honour of helge toutenburg. Physica-Verlag HD, Heidelberg, pp 205-230 · Zbl 1276.62021
[19] He J (2013) Mixture model based multivariate statistical analysis of multiply censored environmental data. Adv Water Res 59:15-24 · doi:10.1016/j.advwatres.2013.05.001
[20] Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17:273-296 · Zbl 1017.62058 · doi:10.1007/s003570000022
[21] Hennig C (2012) Trimcluster: cluster analysis with trimming. https://CRAN.R-project.org/package=trimcluster, r package version 0.1-2
[22] Karlsson M, Laitila T (2014) Finite mixture modeling of censored regression models. Stat Pap 55:627-642 · Zbl 1416.62215 · doi:10.1007/s00362-013-0509-y
[23] Kaufman L, Rousseeuw P (1990) Finding groups in data. Wiley, New York · Zbl 1345.62009 · doi:10.1002/9780470316801
[24] Lachos VH, Moreno EJL, Chen K, Cabral CRB (2017) Finite mixture modeling of censored data using the multivariate student-t distribution. J Multivar Anal 159:151-167 · Zbl 1397.62221 · doi:10.1016/j.jmva.2017.05.005
[25] Lange KL, Sinsheimer JS (1993) Normal/independent distributions and their applications in robust regression. J Comput Graph Stat 2:175-198
[26] Lin TI, Ho HJ, Lee CR (2014) Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat Comput 24:531-546 · Zbl 1325.62113 · doi:10.1007/s11222-013-9386-4
[27] Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:633-648 · Zbl 0812.62028 · doi:10.1093/biomet/81.4.633
[28] Louis T (1982) Finding the observed information matrix when using the em algorithm. J R Stat Soc Ser B 44:226-233 · Zbl 0488.62018
[29] Massuia MB, Cabral CRB, Matos LA, Lachos VH (2015) Influence diagnostics for student-t censored linear regression models. Statistics 49:1074-1094 · Zbl 1382.62050 · doi:10.1080/02331888.2014.958489
[30] MATLAB (2016) version 9.0 (R2016a). The MathWorks Inc., Natick, Massachusetts
[31] Mazza A, Punzo A (2017) Mixtures of multivariate contaminated normal regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0964-y · Zbl 1435.62238 · doi:10.1007/s00362-017-0964-y
[32] McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. John Wiley & Sons, New Jersey · Zbl 1165.62019 · doi:10.1002/9780470191613
[33] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York · Zbl 0963.62061 · doi:10.1002/0471721182
[34] Melenberg B, Soest AV (1996) Parametric and semi-parametric modeling of vacation expenditures. J Appl Econ 11:59-76 · doi:10.1002/(SICI)1099-1255(199601)11:1<59::AID-JAE371>3.0.CO;2-A
[35] Miyata Y (2011) Maximum likelihood estimators in finite mixture models with censored data. J Stat Plan Inference 141:56-64 · Zbl 1197.62026 · doi:10.1016/j.jspi.2010.05.006
[36] Mouselimis L (2017) ClusterR: gaussian mixture models, K-Means, mini-batch-Kmeans and K-Medoids clustering. https://CRAN.R-project.org/package=ClusterR, R package version 1.0.5
[37] Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica 55:765-799 · doi:10.2307/1911029
[38] Powell JL (1984) Least absolute deviations estimation for the censored regression model. J Econ 25:303-325 · Zbl 0571.62100 · doi:10.1016/0304-4076(84)90004-6
[39] Powell JF (1986) Symmetrically trimmed least squares estimation for Tobit models. Econometrica 54:1435-1460 · Zbl 0625.62048 · doi:10.2307/1914308
[40] R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
[41] Raftery AE (1995) Bayesian model selection in social research. Sociol Methodol 25:111-163 · doi:10.2307/271063
[42] Tzortzis G, Likas A (2014) The MinMax k-Means clustering algorithm. Pattern Recognit 47:2505-2516 · doi:10.1016/j.patcog.2014.01.015
[43] Vaida F, Liu L (2009) Fast implementation for normal mixed effects models with censored response. J Comput Graph Stat 18:797-817 · doi:10.1198/jcgs.2009.07130
[44] Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econom J Econom Soc 57:307-333 · Zbl 0701.62106
[45] Witte A (1980) Estimating an economic model of crime with individual data. Q J Econ 94:57-84 · doi:10.2307/1884604
[46] Zhang B (2003) Regression clustering. In: Proceedings of the third IEEE international conference on data mining, Melbourne
[47] Zeller CB, Cabral CRB, Lachos VH (2016) Robust mixture regression modeling based on scale mixtures of skew-normal distributions. Test 25:375-396 · Zbl 1342.62113 · doi:10.1007/s11749-015-0460-4
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.