×

zbMATH — the first resource for mathematics

Weighted likelihood mixture modeling and model-based clustering. (English) Zbl 1436.62255
Summary: A weighted likelihood approach for robust fitting of a mixture of multivariate Gaussian components is developed in this work. Two approaches have been proposed that are driven by a suitable modification of the standard EM and CEM algorithms, respectively. In both techniques, the M-step is enhanced by the computation of weights aimed at downweighting outliers. The weights are based on Pearson residuals stemming from robust Mahalanobis-type distances. Formal rules for robust clustering and outlier detection can be also defined based on the fitted mixture model. The behavior of the proposed methodologies has been investigated by numerical studies and real data examples in terms of both fitting and classification accuracy and outlier detection.

MSC:
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F35 Robustness and adaptive procedures (parametric inference)
Software:
mclust; mtclust; otrimle; R; TCLUST
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Agostinelli, C., Robust model selection in regression via weighted likelihood methodology, Stat. Probab. Lett., 56, 3, 289-300 (2002) · Zbl 0998.62034
[2] Agostinelli, C., Notes on pearson residuals and weighted likelihood estimating equations, Stat. Probab. Lett., 76, 17, 1930-1934 (2006) · Zbl 1099.62022
[3] Agostinelli, C.; Greco, L., A weighted strategy to handle likelihood uncertainty in Bayesian inference, Comput. Stat., 28, 1, 319-339 (2013) · Zbl 1305.65018
[4] Agostinelli, C.; Greco, L., Discussion on “The power of monitoring: how to make the most of a contaminated sample”, Stat. Methods Appl. (2017) · Zbl 1428.62215
[5] Agostinelli, C.; Greco, L., Weighted likelihood estimation of multivariate location and scatter, Test (2018) · Zbl 1420.62128
[6] Atkinson, A.; Riani, M.; Cerioli, A., Exploring Multivariate Data with the Forward Search (2013), Berlin: Springer, Berlin
[7] Basu, A.; Lindsay, B., Minimum disparity estimation for continuous models: efficiency, distributions and robustness, Ann. Inst. Stat. Math., 46, 4, 683-705 (1994) · Zbl 0821.62018
[8] Bouveyron, C.; Brunet-Saumard, C., Model-based clustering of high-dimensional data: a review, Comput. Stat. Data Anal., 71, 52-78 (2014) · Zbl 06975372
[9] Bryant, P., Large-sample results for optimization-based clustering methods, J. Classif., 8, 1, 31-44 (1991) · Zbl 0747.62057
[10] Campbell, N., Mixture models and atypical values, Math. Geol., 16, 5, 465-477 (1984)
[11] Celeux, G.; Govaert, G., Comparison of the mixture and the classification maximum likelihood in cluster analysis, J. Stat. Comput. Simul., 47, 3-4, 127-146 (1993)
[12] Cerioli, A., Multivariate outlier detection with high-breakdown estimators, J. Am. Stat. Assoc., 105, 489, 147-156 (2010) · Zbl 1397.62167
[13] Cerioli, A.; Farcomeni, A., Error rates for multivariate outlier detection, Comput. Stat. Data Anal., 55, 1, 544-553 (2011) · Zbl 1247.62192
[14] Cerioli, A.; Riani, M.; Atkinson, A.; Corbellini, A., The power of monitoring: how to make the most of a contaminated sample, Stat. Methods Appl. (2017) · Zbl 1427.62047
[15] Colonna, Jg; Gama, J.; Nakamura, E., Recognizing Family, Genus, and Species of Anuran Using a Hierarchical Classification Approach. Lecture Notes in Computer Science, 198-212 (2016), Berlin: Springer, Berlin
[16] Coretto, P.; Hennig, C., Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust gaussian clustering, J. Am. Stat. Assoc., 111, 516, 1648-1659 (2016)
[17] Coretto, P.; Hennig, C., Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering, J. Mach. Learn. Res., 18, 1, 5199-5237 (2017)
[18] Day, N., Estimating the components of a mixture of normal distributions, Biometrika, 56, 3, 463-474 (1969) · Zbl 0183.48106
[19] Dempster, A.; Laird, Nm; Rubin, Db, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B Methodol., 39, 1-38 (1977) · Zbl 0364.62022
[20] Dotto, F.; Farcomeni, A., Robust inference for parsimonious model-based clustering, J. Stat. Comput. Simul., 89, 3, 414-442 (2019)
[21] Dotto, F.; Farcomeni, A.; Garcia-Escudero, La; Mayo-Iscar, A., A reweighting approach to robust clustering, Stat. Comput., 28, 2, 477-493 (2016) · Zbl 1384.62193
[22] Elashoff, M.; Ryan, L., An em algorithm for estimating equations, J. Comput. Graph. Stat., 13, 1, 48-65 (2004)
[23] Farcomeni, A.; Greco, L., Robust Methods for Data Reduction (2015), Boca Raton: CRC Press, Boca Raton · Zbl 1311.62006
[24] Farcomeni, A.; Greco, L., S-estimation of hidden Markov models, Comput. Stat., 30, 1, 57-80 (2015) · Zbl 1342.65032
[25] Fraley, C.; Raftery, A., How many clusters? Which clustering method? Answers via model-based cluster analysis, Comput. J., 41, 8, 578-588 (1998) · Zbl 0920.68038
[26] Fraley, C.; Raftery, A., Model-based clustering, discriminant analysis, and density estimation, J. Am. Stat. Assoc., 97, 458, 611-631 (2002) · Zbl 1073.62545
[27] Fraley, C., Raftery, A., Murphy, T., Scrucca, L.: mclust version 4 for r: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, University of Washington, Seattle (2012)
[28] Fritz, H.; Garcia-Escudero, L.; Mayo-Iscar, A., A fast algorithm for robust constrained clustering, Comput. Stat. Data Anal., 61, 124-136 (2013) · Zbl 1349.62264
[29] Garcia-Escudero, L.; Gordaliza, A.; Matran, C.; Mayo-Iscar, A., A general trimming approach to robust cluster analysis, Ann. Stat., 36, 1324-1345 (2008) · Zbl 1360.62328
[30] García-Escudero, La; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A., Exploring the number of groups in robust model-based clustering, Stat. Comput., 21, 4, 585-599 (2011) · Zbl 1221.62093
[31] Garcia-Escudero, L.; Gordaliza, A.; Matran, C.; Mayo-Iscar, A., Avoiding spurious local maximizers in mixture modeling, Stat. Comput., 25, 3, 619-633 (2015) · Zbl 1331.62100
[32] Greco, L., Weighted likelihood based inference for \(p (x< y)\), Commun. Stat. Simul. Comput., 46, 10, 7777-7789 (2017) · Zbl 1383.62227
[33] Helliwell, J., Layard, R., Sachs, J.: World Happiness Report 2018 (2018)
[34] Kuchibhotla, A.; Basu, A., A general set up for minimum disparity estimation, Stat. Probab. Lett., 96, 68-74 (2015) · Zbl 1314.62089
[35] Kuchibhotla, A., Basu, A.: A minimum distance weighted likelihood method of estimation. Technical report, Interdisciplinary Statistical Research Unit (ISRU), Indian Statistical Institute, Kolkata, India (2018). https://faculty.wharton.upenn.edu/wp-content/uploads/2018/02/attemptv4p1.pdf. Accessed 17 Jan 2018
[36] Lee, S.; Mclachlan, G., Finite mixtures of multivariate skew t-distributions: some recent and new results, Stat. Comput., 24, 2, 181-202 (2014) · Zbl 1325.62107
[37] Lin, T., Robust mixture modeling using multivariate skew t distributions, Stat. Comput., 20, 3, 343-356 (2010)
[38] Markatou, M., Mixture models, robustness, and the weighted likelihood methodology, Biometrics, 56, 2, 483-486 (2000) · Zbl 1060.62511
[39] Markatou, M.; Basu, A.; Lindsay, Bg, Weighted likelihood equations with bootstrap root search, J. Am. Stat. Assoc., 93, 442, 740-750 (1998) · Zbl 0918.62046
[40] Maronna, R.; Jacovkis, P., Multivariate clustering procedures with variable metrics, Biometrics, 30, 3, 499-505 (1974) · Zbl 0285.62036
[41] Mclachlan, G.; Peel, D., Finite Mixture Models (2004), New York: Wiley, New York
[42] Mclachlan, Gj; Peel, D.; Bean, R., Modelling high-dimensional data by mixtures of factor analyzers, Comput. Stat. Data Anal., 41, 3-4, 379-388 (2003) · Zbl 1256.62036
[43] Neykov, N.; Filzmoser, P.; Dimova, R.; Neytchev, P., Robust fitting of mixtures using the trimmed likelihood estimator, Comput. Stat. Data Anal., 52, 1, 299-308 (2007) · Zbl 1328.62033
[44] R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019). https://www.R-project.org/
[45] Rousseeuw, P.; Van Zomeren, B., Unmasking multivariate outliers and leverage points, J. Am. Stat. Assoc., 85, 411, 633-639 (1990)
[46] Symon, M., Clustering criterion and multi-variate normal mixture, Biometrics, 77, 35-43 (1977)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.