×

Model-based clustering and classification with non-normal mixture distributions. (English) Zbl 1332.62209

Summary: Non-normal mixture distributions have received increasing attention in recent years. Finite mixtures of multivariate skew-symmetric distributions, in particular, the skew normal and skew \(t\)-mixture models, are emerging as promising extensions to the traditional normal and \(t\)-mixture models. Most of these parametric families of skew distributions are closely related, and can be classified into four forms under a recently proposed scheme, namely, the restricted, unrestricted, extended, and generalised forms. In this paper, we consider some of these existing proposals of multivariate non-normal mixture models and illustrate their practical use in several real applications. We first discuss the characterizations along with a brief account of some distributions belonging to the above classification scheme, then references for software implementation of EM-type algorithms for the estimation of the model parameters are given. We then compare the relative performance of restricted and unrestricted skew mixture models in clustering, discriminant analysis, and density estimation on six real datasets from flow cytometry, finance, and image analysis. We also compare the performance of mixtures of skew normal and \(t\)-component distributions with other non-normal component distributions, including mixtures with multivariate normal-inverse-Gaussian distributions, shifted asymmetric Laplace distributions and generalized hyperbolic distributions.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62E10 Characterization and structure theory of statistical distributions
62G07 Density estimation
62-07 Data analysis (statistics) (MSC2010)
62P05 Applications of statistics to actuarial sciences and financial mathematics
62H35 Image analysis in multivariate analysis
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Aghaeepour, N; Finak, G; Consortium, TF; Consortium, TD; Hoos, H; Mosmann, TR; Brinkman, R; Gottardo, R; Scheuermann, RH, Critical assessment of automated flow cytometry data analysis techniques, Nat Methods, 10, 228-238, (2013)
[2] Altman, EI, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, J Finance, 23, 589-609, (1968)
[3] Arellano-Valle, RB; Azzalini, A, On the unification of families of skew-normal distributions, Scand J Stat, 33, 561-574, (2006) · Zbl 1117.62051
[4] Arellano-Valle, RB; Genton, MG, On fundamental skew distribtuions, J Multivar Anal, 96, 93-116, (2005) · Zbl 1073.62049
[5] Arellano-Valle, RB; Genton, MG, Multivariate extended skew-\(t\) distributions and related families, Metron—special issue on ‘Skew-symmetric and flexible distributions’, 68, 201-234, (2010) · Zbl 1301.62016
[6] Arellano-Valle, RB; Genton, MG, Multivariate unified skew-elliptical distributions, Chil J Stat, 1, 17-33, (2010) · Zbl 1213.62087
[7] Arellano-Valle, RB; Pino, G; Martin, ES, Definition and probabilistic properties of skew-distributions, Stat Probab Lett, 58, 111-121, (2002) · Zbl 1045.62046
[8] Arellano-Valle, RB; Branco, MD; Genton, MG, A unified view on skewed distributions arising from selections, Can J Stat, 34, 581-601, (2006) · Zbl 1121.60009
[9] Arnold, BC; Beaver, RJ; Meeker, WQ, The nontruncated marginal of a truncated bivariate normal distribution, Psychometrika, 58, 471-488, (1993) · Zbl 0794.62075
[10] Azzalini, A, A class of distributions which includes the normal ones, Scand J Stat, 12, 171-178, (1985) · Zbl 0581.62014
[11] Azzalini, A; Capitanio, A, Statistical applications of the multivariate skew-normal distribution, J R Stat Soc Ser B, 61, 579-602, (1999) · Zbl 0924.62050
[12] Azzalini, A; Capitanio, A, Distribution generated by perturbation of symmetry with emphasis on a multivariate skew t distribution, J R Stat Soc Ser B, 65, 367-389, (2003) · Zbl 1065.62094
[13] Azzalini, A; Dalla Valle, A, The multivariate skew-normal distribution, Biometrika, 83, 715-726, (1996) · Zbl 0885.62062
[14] Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803-821 · Zbl 0794.62034
[15] Barndorff-Nielsen, OE, Exponentially decreasing distributions from the logarithm of of particle size, Proc R Soc Lond, A353, 401-419, (1977)
[16] Basso, RM; Lachos, VH; Cabral, CRB; Ghosh, P, Robust mixture modeling based on scale mixtures of skew-normal distributions, Comput Stat Data Anal, 54, 2926-2941, (2010) · Zbl 1284.62193
[17] Böhning D (1999) Computer-assisted analysis of mixtures and applications: meta-analysis, disease mapping and others. Chapman and Hall/CRC Press, London
[18] Branco, MD; Dey, DK, A general class of multivariate skew-elliptical distributions, J Multivar Anal, 79, 99-113, (2001) · Zbl 0992.62047
[19] Browne RP, McNicholas PD (2013) A mixture of generalized hyperbolic distributions. arXiv:13051036 [statME]
[20] Cabral, CS; Lachos, VH; Prates, MO, Multivariate mixture modeling using skew-normal independent distributions, Comput Stat Data Anal, 56, 126-142, (2012) · Zbl 1239.62058
[21] Calò AG, Montanari A, Viroli C (2013) A hierarchical modeling approach for clustering probability density functions. Comput Stat Data Anal. doi:10.1016/j.csda.2013.04.013
[22] Charytanowicz, M; Niewczas, J; Kulczycki, P; Kowalski, P; Lukasik, S; Zak, S; Pietka, E (ed.); Kawa, J (ed.), A complete gradient clustering algorithm for features analysis of x-ray images, 15-24, (2010), Berlin
[23] Choi, P; Min, I, A comparison of conditional and unconditional approaches in value-at-risk estimation, J Jpn Econ Assoc, 62, 99-115, (2011)
[24] Christoffersen, PF, Evaluating interval forecasts, Int Econ Rev, 39, 841-862, (1998)
[25] Contreras-Reyes JE, Arellano-Valle RB (2012) Growth curve based on scale mixtures of skew-normal distributions to model the age-length relationship of cardinalfish (epigonus crassicaudus). arXiv:12125180 [statAP]
[26] Cook RD, Weisberg S (1994) An introduction to regression graphics. Wiley, New York · Zbl 0925.62287
[27] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc Ser B, 39, 1-38, (1977) · Zbl 0364.62022
[28] Everitt BS, Hand DJ (1981) Finite mixture distributions. Chapman and Hall, London
[29] Fang KT, Kotz S, Ng K (1990) Symmetric multivariate and related distributions. Chapman & Hall, London · Zbl 0699.62048
[30] Fraley, C; Raftery, AE, How many clusters? which clustering methods? answers via model-based cluster analysis, Comput J, 41, 578-588, (1999) · Zbl 0920.68038
[31] Franczak BC, Browne RP, McNicholas PD (2012) Mixtures of shifted asymmetric laplace distributions. arXiv:12071727 [statME] · Zbl 0794.62075
[32] Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York · Zbl 1108.62002
[33] Frühwirth-Schnatter, S; Pyne, S, Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions, Biostatistics, 11, 317-336, (2010)
[34] Ganesalingam, S; McLachlan, GJ, The efficiency of a linear discriminant function based on unclassified initial samples, Biometrika, 65, 658-662, (1978) · Zbl 0389.62045
[35] González-Farás, G; Domínguez-Molinz, JA; Gupta, AK, Additive properties of skew normal random vectors, J Stat Plan Inference, 126, 521-534, (2004) · Zbl 1076.62052
[36] Gupta, AK, Multivariate skew-\(t\) distribution, Statistics, 37, 359-363, (2003) · Zbl 1037.62045
[37] Gupta AK, González-Faríaz G, Domínguez-Molina JA (2004) A multivariate skew normal distribution. J Multivar Anal 89:181-190 · Zbl 0389.62045
[38] Hubert, L; Arabie, P, Comparing partitions, J Classif, 2, 193-218, (1985)
[39] Jones, PN; McLachlan, GJ, Modelling mass-size particle data by finite mixtures, Commun Stat Theory Methods, 18, 2629-2646, (1989) · Zbl 0696.62379
[40] Jordan, MI; Jacobs, RA; Moody, J (ed.); Hanson, S (ed.); Lippmann, R (ed.), Hierarchies of adaptive experts, 985-993, (1992), California · Zbl 0798.35122
[41] Karlis, D; Santourian, A, Model-based clustering with non-elliptically contoured distributions, Stat Comput, 19, 73-83, (2009)
[42] Karlis, D; Xekalaki, E, Choosing initial values for the EM algorithm for finite mixtures, Comput Stat Data Anal, 41, 577-590, (2003) · Zbl 1429.62082
[43] Kotz S, Kozubowski TJ, Podgórski K (2001) The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Birkhauser, Boston · Zbl 0977.62003
[44] Kupiec, P, Techniques for verifying the accuracy of risk management models, J Deriv, 3, 73-84, (1995)
[45] Lachos, VH; Ghosh, P; Arellano-Valle, RB, Likelihood based inference for skew normal independent linear mixed models, Statistica Sinica, 20, 303-322, (2010) · Zbl 1186.62071
[46] Lee S, McLachlan GJ (2011) On the fitting of mixtures of multivariate skew \(t\)-distributions via the EM algorithm. arXiv:11094706 [statME] · Zbl 0389.62045
[47] Lee S, McLachlan GJ (2013a) Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results. Stat Comput. doi:10.1007/s11222-012-9362-4 · Zbl 1076.62052
[48] Lee SX, McLachlan GJ (2013b) EMMIX-uskew: an R package for fitting mixtures of multivariate skew \(t\)-distributions via the EM algorithm. J Stat Softw. Preprint arXiv:1211.5290 · Zbl 0364.62022
[49] Lee SX, McLachlan GJ (2013c) On mixtures of skew-normal and skew \(t\)-distributions. Adv Data Anal Classif. doi:10.1007/s11634-013-0132-8 · Zbl 0581.62014
[50] Lin, TI, Maximum likelihood estimation for multivariate skew-normal mixture models, J Multivar Anal, 100, 257-265, (2009) · Zbl 1152.62034
[51] Lin, TI, Robust mixture modeling using multivariate skew \(t\) distribution, Stat Comput, 20, 343-356, (2010)
[52] Lin TI, Ho HJ, Lee CR (2013) Flexible mixture modelling using the multivariate skew-\(t\)-normal distribution. Stat Comput. doi:10.1007/s11222-013-9386-4
[53] Lindsay BG (1995) Mixture models: theory, geometry, and applications. In: NSF-CBMS regional conference series in probability and statistics, vol 5, Institute of Mathematical Statistics and the American Statistical Association, Alexandria, VA · Zbl 1186.62071
[54] Liseo, B; Loperfido, N, A Bayesian interpretation of the multivariate skew-normal distribution, Stat Probab Lett, 61, 395-401, (2003) · Zbl 1101.62342
[55] Lo, K; Brinkman, RR; Gottardo, R, Automated gating of flow cytometry data via robust model-based clustering, Cytom Part A, 73, 312-332, (2008)
[56] Lo, K; Hahne, F; Brinkman, RR; Gottardo, R, Flowclust: a bioconductor package for automated gating of flow cytometry data, BMC Bioinform, 10, 145, (2009)
[57] Martin, D; Fowlkes, C; Tal, D; Malik, J, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, Proc Int Conf Comput Vis, 2, 416-423, (2001)
[58] McLachlan GJ, Basford KE (1988) Mixture models: inference and applications. Marcel Dekker, New York
[59] McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley-Interscience, Hokoben, NJ · Zbl 1165.62019
[60] McLachlan GJ, Peel D (1998) Robust cluster analysis via mixtures of multivariate \(t\)-distributions. In: Amin A, Dori D, Pudil P, Freeman H (eds) Lecture notes in computer science. Springer, Berlin, pp 658-666
[61] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics, New York
[62] McNeil AJ, Frey R, Embrechts P (2005) Quantitative risk management: concepts, techniques and tools. Princeton University Press, USA · Zbl 1089.91037
[63] Meignen, S; Meignen, H, On the modeling of small sample distributions with generalized Gaussian density in a maximum likelihood framework, IEEE Trans Image Process, 15, 1647-1652, (2006)
[64] Meilă M (2005) Comparing clusterings—an axiomatic view. In: In ICML ’05: proceedings of the 22nd international conference on machine learning, ACM Press, pp 577-584
[65] Mengersen KL, Robert CP, Titterington DM (2011) Mixtures: estimation and applications. Wiley, NewYork · Zbl 1218.62003
[66] Nadarajah, S, Skewed distributions generated by the student’s \(t\) kernel, Monte Carlo Methods Appl, 13, 289-404, (2008) · Zbl 1129.62049
[67] Nadarajah, S; Kotz, S, Skewed distributions generated by the normal kernel, Stat Probab Lett, 65, 269-277, (2003) · Zbl 1048.62014
[68] Nguyen, TM; Wu, QMJ, A nonsymmetric mixture model for unsupervised image segmentation, IEEE Trans Cybern, 43, 751-765, (2013)
[69] Nikolic R (2010) flowKoh: self-organizing map for flow cytometry data analysis. http://commons.bcit.ca/radina_nikolic/docs/flowKoh_R_Code.zip
[70] Prates M, Lachos V, Cabral C (2011) mixsmsn: fitting finite mixture of scale mixture of skew-normal distributions. R package version 0.3-2. http://CRAN.R-project.org/package=mixsmsn · Zbl 0581.62014
[71] Pyne, S; Hu, X; Wang, K; Rossin, E; Lin, TI; Maier, LM; Baecher-Allan, C; McLachlan, GJ; Tamayo, P; Hafler, DA; Jager, PL; Mesirow, JP, Automated high-dimensional flow cytometric data analysis, Proc Natl Acad Sci USA, 106, 8519-8524, (2009)
[72] Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirow JP (2009b) FLAME: flow analysis with automated multivariate estimation. http://www.broadinstitute.org/cancer/software/genepattern/modules/FLAME/published_data
[73] Qian, Y; Wei, C; Lee, F; Campbell, J; Halliley, J; Lee, J; Cai, J; Kong, Y; Sadat, E; Thomson, E, Elucidation of seventeen human peripheral blood b-cell subsets and quantification of the tetanus response using a density-based method for the automated identification of cell populations in multidimensional flow cytometry data, Cytom Part B, 78, s69-s82, (2010)
[74] R Development Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. ISBN 3-900051-07-0
[75] Rand, WM, Objective criteria for the evaluation of clustering methods, J Am Stat Assoc, 66, 846-850, (1971)
[76] Riggi S, Ingrassia S (2013) Modeling high energy cosmic rays mass composition data via mixtures of multivariate skew-\(t\) distributions. arXiv:13011178 [astro-phHE] · Zbl 1284.62193
[77] Rodrigues, J, A Bayesian inference for the extended skew-normal measurement error model, Brazilian J Probab Stat, 20, 179-190, (2006) · Zbl 1272.62042
[78] Sahu, SK; Dey, DK; Branco, MD, A new class of multivariate skew distributions with applications to Bayesian regression models, Can J Stat, 31, 129-150, (2003) · Zbl 1039.62047
[79] Soltyk S, Gupta R (2011) Application of the multivariate skew normal mixture model with the EM algorithm to value-at-risk. In: MODSIM 2011—19th International Congress on Modelling and Simulation, Perth, Australia, 12-16 Dec 2011 · Zbl 1213.62087
[80] Titterington DM, Smith AFM, Markov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
[81] Vrbik, I; McNicholas, PD, Analytic calculations for the EM algorithm for multivariate skew \(t\)-mixture models, Stat Probab Lett, 82, 1169-1174, (2012) · Zbl 1244.65012
[82] Wang K, McLachlan GJ, Ng SK, Peel D (2009) EMMIX-skew: EM algorithm for mixture of multivariate skew normal/\(t\) distributions. R package version 1.0-12. http://www.maths.uq.edu.au/ gjm/mix_soft/EMMIX-skew · Zbl 0885.62062
[83] Zhang Y, Brady M, Smith S (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation maximization algorithm. IEEE Trans Med Imaging 20:45-57
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.