×

Finite mixtures of canonical fundamental skew \(t\)-distributions. The unification of the restricted and unrestricted skew \(t\)-mixture models. (English) Zbl 1420.60020

Summary: This paper introduces a finite mixture of canonical fundamental skew \(t\) (CFUST) distributions for a model-based approach to clustering where the clusters are asymmetric and possibly long-tailed [S. X. Lee and G. J. McLachlan [Maximum likelihood estimation for finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the unrestricted and restricted skew \(t\)-mixture models, Preprint, arxiv:1401.8182]. The family of CFUST distributions includes the restricted multivariate skew \(t\) and unrestricted multivariate skew \(t\) distributions as special cases. In recent years, a few versions of the multivariate skew \(t\) (MST) mixture model have been put forward, together with various EM-type algorithms for parameter estimation. These formulations adopted either a restricted or unrestricted characterization for their MST densities. In this paper, we examine a natural generalization of these developments, employing the CFUST distribution as the parametric family for the component distributions, and point out that the restricted and unrestricted characterizations can be unified under this general formulation. We show that an exact implementation of the EM algorithm can be achieved for the CFUST distribution and mixtures of this distribution, and present some new analytical results for a conditional expectation involved in the E-step.

MSC:

60E05 Probability distributions: general theory
62F10 Point estimation

Software:

mixsmsn; PGMM; sn
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Aas, K; Haff, IH, The generalized hyperbolic skew student’s \(t\)-distribution, J. Financ. Econom., 4, 275-309, (2005)
[2] Aghaeepour, N., Finak, G., The FLOWCAP Consortium, The DREAM Consortium, Hoos, H., Mosmann, T., Gottardo, R., Brinkman, R.R., Scheuermann, R.H.: Critical assessment of automated flow cytometry analysis techniques. Nat. Methods 10, 228-238 (2013)
[3] Anderson, E, The irises of the gaspé peninsula, Bull. Am. Iris Soc., 59, 2-5, (1935)
[4] Arellano-Valle, RB; Azzalini, A, On the unification of families of skew-normal distributions, Scand. J. Stat., 33, 561-574, (2006) · Zbl 1117.62051
[5] Arellano-Valle, RB; Genton, MG, On fundamental skew distribtuions, J. Multivar. Anal., 96, 93-116, (2005) · Zbl 1073.62049
[6] Arellano-Valle, RB; Branco, MD; Genton, MG, A unified view on skewed distributions arising from selections, Can. J. Stat., 34, 581-601, (2006) · Zbl 1121.60009
[7] Asparouhov, T; Muthén, B, Structural equation models and mixture models with continuous non-normal skewed distributions, Mplus Web Notes, 19, 1-49, (2014)
[8] Azzalini, A, The skew-normal distribution and related multivariate families, Scand. J. Stat., 32, 159-188, (2005) · Zbl 1091.62046
[9] Azzalini, A.: The Skew-Normal and Related Families. Institute of Mathematical Statistics Monographs, Cambridge University Press, Cambridge (2014) · Zbl 1338.62007
[10] Banfield, JD; Raftery, AE, Model-based gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034
[11] Bernardi, M, Risk measures for skew normal mixtures, Stat. Probab. Lett., 83, 1819-1824, (2013) · Zbl 1284.62105
[12] Böhning, D.: Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Disease Mapping and Others. Chapman and Hall, London (1999) · Zbl 0951.62088
[13] Böhning, D; Dietz, E; Schaub, R; Schlattmann, P; Lindsay, B, The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family, Ann. Inst. Stat. Math., 46, 373-388, (1994) · Zbl 0802.62017
[14] Browne, R.P., McNicholas, P.D.: A mixture of generalized hyperbolic distributions. arXiv:1305.1036 [statME] (2013) · Zbl 1320.62144
[15] Cabral, CS; Lachos, VH; Prates, MO, Multivariate mixture modeling using skew-normal independent distributions, Comput. Stat. Data Anal., 56, 126-142, (2012) · Zbl 1239.62058
[16] Contreras-Reyes, JE; Arellano-Valle, RB, Growth estimates of cardinalfish (epigonus crassicaudus) based on scale mixtures of skew-normal distributions, Fish. Res., 147, 137-144, (2013)
[17] Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994) · Zbl 0925.62287
[18] Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman and Hall, London (1981) · Zbl 0466.62018
[19] Fisher, RA, The use of multiple measurements in taxonomic problems, Ann. Eugen., 7, 179-188, (1936)
[20] Forbes, F., Wraith, D.: A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering. Stat. Comput. (2013). doi:10.1007/s11222-013-9414-4 · Zbl 1332.62204
[21] Fraley, C; Raftery, AE, How many clusters? which clustering methods? answers via model-based cluster analysis, Comput. J., 41, 578-588, (1999) · Zbl 0920.68038
[22] Franczak, B.C., Browne, R.P., McNicholas, P.D.: Mixtures of shifted asymmetric laplace distributions. IEEE Trans. Pattern Anal. Mach. Intell. (2013). doi:10.1109/TPAMI.2013.216
[23] Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006) · Zbl 1108.62002
[24] Frühwirth-Schnatter, S; Pyne, S, Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions, Biostatistics, 11, 317-336, (2010)
[25] Genton, MGe: Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman and Hall, London (2004) · Zbl 1069.62045
[26] Ho, HJ; Lin, TI; Chang, HH; Haase, HB; Huang, S; Pyne, S, Parametric modeling of cellular state transitions as measured with flow cytometry different tissues, BMC Bioinform., 13, s5, (2012)
[27] Ho, HJ; Lin, TI; Chen, HY; Wang, WL, Some results on the truncated multivariate \(t\) distribution, J. Stat. Plan. Inference, 142, 25-40, (2012) · Zbl 1229.62068
[28] Hu, X; Kim, H; Brennan, PJ; Han, B; Baecher-Allan, CM; Jager, PL; Brenner, MB; Raychaudhuri, S, Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer t cells, Proc. Natl. Acad. Sci. USA, 110, 19,030-19,035, (2013)
[29] Hubert, L; Arabie, P, Comparing partitions, J. Classif., 2, 193-218, (1985) · Zbl 0587.62128
[30] Karlis, D; Santourian, A, Model-based clustering with non-elliptically contoured distributions, Stat. Comput., 19, 73-83, (2009)
[31] Lee, S., McLachlan, G.J.: On the fitting of mixtures of multivariate skew \(t\)-distributions via the EM algorithm. arXiv:1109.4706 [statME] (2011)
[32] Lee, S; McLachlan, GJ, Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results, Stat. Comput., 24, 181-202, (2014) · Zbl 1325.62107
[33] Lee, SX; McLachlan, GJ, Model-based clustering and classification with non-normal mixture distributions, Stat. Methods Appl., 22, 427-454, (2013) · Zbl 1332.62209
[34] Lee, S.X., McLachlan, G.J.: Modelling asset return using multivariate asymmetric mixture models with applications to estimation of value-at-risk. In: Piantadosi, J., Anderssen, R.S., Boland, J. (eds.) MODSIM 2013 (20th International Congress on Modelling and Simulation), pp. 1228-1234. Adelaide (2013)
[35] Lee, SX; McLachlan, GJ, On mixtures of skew-normal and skew \(t\)-distributions, Adv. Data Anal. Classif., 7, 241-266, (2013) · Zbl 1273.62115
[36] Lee, S.X., McLachlan, G.J.: Maximum likelihood estimation for finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the unrestricted and restricted skew t-mixture models. arXiv:1401.8182 [statME] (2014b)
[37] Lee, Y.W., Poon, S.H.: Systemic and systematic factors for loan portfolio loss distribution. Econometrics and applied economics workshops pp. 1-61. School of Social Science, University of Manchester (2011) · Zbl 1284.62105
[38] Lin, TI, Robust mixture modeling using multivariate skew \(t\) distribution, Stat. Comput., 20, 343-356, (2010)
[39] Lindsay, B.G.: Mixture Models: Theory, Geometry, and Applications. NSF-CBMS Regional Conference Series in probability and Statistics, vol. 5. Institute of Mathematical Statistics and the American Statistical Association, Alexandria (1995) · Zbl 1163.62326
[40] McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications. Marcel Dekker, New York (1988) · Zbl 0697.62050
[41] McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997) · Zbl 0882.62012
[42] McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics, New York (2000) · Zbl 0963.62061
[43] McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious gaussian mixture models. Comput. Stat. Data Anal. 54, 711-723 (2010) · Zbl 1464.62131
[44] Mengersen, K.L., Robert, C.P., Titterington, D.M.: Mixtures: Estimation and Applications. Wiley, New York (2011) · Zbl 1218.62003
[45] Murray, PM; Browne, BP; McNicholas, PD, Mixtures of skew-\(t\) factor analyzers, Comput. Stat. Data Anal., 77, 326-335, (2014) · Zbl 06984029
[46] Pyne, S; Hu, X; Wang, K; Rossin, E; Lin, TI; Maier, LM; Baecher-Allan, C; McLachlan, GJ; Tamayo, P; Hafler, DA; Jager, PL; Mesirow, JP, Automated high-dimensional flow cytometric data analysis, Proc. Natl. Acad. Sci. USA, 106, 8519-8524, (2009)
[47] Pyne, S; Lee, SX; Wang, K; Irish, J; Tamayo, P; Nazaire, MD; Duong, T; Ng, SK; Hafler, D; Levy, R; Nolan, GP; Mesirov, J; McLachlan, G, Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data, PLoS One, 9, 334, (2014)
[48] Riggi, S; Ingrassia, S, A model-based clustering approach for mass composition analysis of high energy cosmic rays, Astropart. Phys., 48, 86-96, (2013)
[49] Rossin, E; Lin, TI; Ho, HJ; Mentzer, SJ; Pyne, S, A framework for analytical characterization of monoclonal antibodies based on reactivity profiles in different tissues, Bioinformatics, 27, 2746-2753, (2011)
[50] Sahu, SK; Dey, DK; Branco, MD, A new class of multivariate skew distributions with applications to Bayesian regression models, Can. J. Stat., 31, 129-150, (2003) · Zbl 1039.62047
[51] Sahu, SK; Dey, DK; Branco, MD, Erratum: a new class of multivariate skew distributions with applications to Bayesian regression models, Can. J. Stat., 37, 301-302, (2009) · Zbl 1175.62049
[52] Schwarz, G, Estimating the dimension of a model, Ann. Stat., 6, 461-464, (1978) · Zbl 0379.62005
[53] Soltyk, S., Gupta, R.: Application of the multivariate skew normal mixture model with the EM algorithm to value-at-risk. In: Chan, F., Marinova, D., Anderssen, R.S. (eds.) MODSIM 2011 (19th International Congress on Modelling and Simulation), pp. 1638-1644. Perth (2011) · Zbl 1175.62049
[54] Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985) · Zbl 0646.62013
[55] Tortora, C., Franczak, B.C., Browne, B.P., McNicholas, P.D.: Model-based clustering using mixtures of coalesced generalized hyperbolic distributions. Preprint arXiv:1403.2332 [statME] (2014)
[56] Vrbik, I; McNicholas, PD, Analytic calculations for the EM algorithm for multivariate skew \(t\)-mixture models, Stat. Probab. Lett., 82, 1169-1174, (2012) · Zbl 1244.65012
[57] Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Bottema, M.J., Lovell, B.C., Maeder, A.J. (eds.) DICTA 2009 (Conference of Digital Image Computing: Techniques and Applications, Melbourne), pp. 526-531. IEEE Computer Society, Los Alamitos (2009)
[58] Wendel, JG, Note on the gamma function, Am. Math. Mon., 55, 563-564, (1948)
[59] Wraith, D., Forbes, F.: Clustering using skewed multivariate heavy tailed distributions with flexible tail behaviour. Preprint. arXiv:1408.0711 [statME] (2014) · Zbl 1332.62204
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.