×

zbMATH — the first resource for mathematics

Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results. (English) Zbl 1325.62107
Summary: Finite mixtures of multivariate skew \(t\) (MST) distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour. Recently, they have been exploited as an effective tool for modelling flow cytometric data. A number of algorithms for the computation of the maximum likelihood (ML) estimates for the model parameters of mixtures of MST distributions have been put forward in recent years. These implementations use various characterizations of the MST distribution, which are similar but not identical. While exact implementation of the expectation-maximization (EM) algorithm can be achieved for ‘restricted’ characterizations of the component skew \(t\)-distributions, Monte Carlo (MC) methods have been used to fit the ‘unrestricted’ models. In this paper, we review several recent fitting algorithms for finite mixtures of multivariate skew \(t\)-distributions, at the same time clarifying some of the connections between the various existing proposals. In particular, recent results have shown that the EM algorithm can be implemented exactly for faster computation of ML estimates for mixtures with unrestricted MST components. The gain in computational time is effected by noting that the semi-infinite integrals on the E-step of the EM algorithm can be put in the form of moments of the truncated multivariate non-central \(t\)-distribution, similar to the restricted case, which subsequently can be expressed in terms of the non-truncated form of the central \(t\)-distribution function for which fast algorithms are available. We present comparisons to illustrate the relative performance of the restricted and unrestricted models, and demonstrate the usefulness of the recently proposed methodology for the unrestricted MST mixture, by some applications to three real datasets.

MSC:
62H05 Characterization and structure theory for multivariate probability distributions; copulas
62H30 Classification and discrimination; cluster analysis (statistical aspects)
65C60 Computational problems in statistics (MSC2010)
Software:
mixsmsn; sn; EMMIX-skew
PDF BibTeX Cite
Full Text: DOI
References:
[1] Akaike, H.: A new look at the statistical model identification. Autom. Control 19, 716–723 (1974) · Zbl 0314.62039
[2] Arellano-Valle, R., Bolfarine, H., Lachos, V.: Bayesian inference for skew-normal linear mixed models. J. Appl. Stat. 34(6), 663–682 (2007)
[3] Arellano-Valle, R.B., Azzalini, A.: On the unification of families of skew-normal distributions. Scand. J. Stat. 33, 561–574 (2006) · Zbl 1117.62051
[4] Arellano-Valle, R.B., Genton, M.G.: On fundamental skew distributions. J. Multivar. Anal. 96, 93–116 (2005) · Zbl 1073.62049
[5] Arnold, B.C., Beaver, R.J.: Skewed multivariate models related to hidden truncation and/or selective reporting. Test 11, 7–54 (2002) · Zbl 1033.62013
[6] Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985) · Zbl 0581.62014
[7] Azzalini, A.: The skew-normal distribution and related multivariate families. Scand. J. Stat. 32, 159–188 (2005) · Zbl 1091.62046
[8] Azzalini, A., Capitanio, A.: Distribution generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. J. R. Stat. Soc., Ser. B 65, 367–389 (2003) · Zbl 1065.62094
[9] Azzalini, A., Dalla, Valle A.: The multivariate skew-normal distribution. Biometrika 83(4), 715–726 (1996) · Zbl 0885.62062
[10] Banfield, J.D., Raftery, A.: Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 (1993) · Zbl 0794.62034
[11] Basso, R.M., Lachos, V.H., Cabral, C.R.B., Ghosh, P.: Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput. Stat. Data Anal. 54, 2926–2941 (2010) · Zbl 1284.62193
[12] Böhning, D.: Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Discase Mapping and Others. Chapman and Hall, New York (1999)
[13] Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001) · Zbl 0992.62047
[14] Brinkman, R., Gaspareto, M., Lee, S.J., Ribickas, A., Perkins, J., Janssen, W., Smiley, R., Smith, C.: High content flow cytometry and temporal data analysis for defining a cellular signature of graft versus host disease. Biol. Blood Marrow Transplant. 13, 691–700 (2007)
[15] Cabral, C., Bolfarine, H., Pereira, J.: Bayesian density estimation using skew student-t-normal mixtures. Comput. Stat. Data Anal. 52, 5075–5090 (2008) · Zbl 1452.62263
[16] Cabral, C., Lachos, V., Prates, M.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012) · Zbl 1239.62058
[17] Dempster, A., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc., Ser. B 39, 1–38 (1977) · Zbl 0364.62022
[18] Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman and Hall, London (1981) · Zbl 0466.62018
[19] Fraley, C., Raftery, A.E.: How many clusters? Which clustering methods? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1999) · Zbl 0920.68038
[20] Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006) · Zbl 1108.62002
[21] Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11, 317–336 (2010)
[22] Genz, A., Bretz, F.: Methods for the computation of multivariate t-probabilities. J. Comput. Graph. Stat. 11, 950–971 (2002)
[23] Gómez, H., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18, 395–407 (2007)
[24] González-Farás, G., Domínguez-Molinz, J.A., Gupta, A.K.: Additive properties of skew normal random vectors. J. Stat. Plan. Inference 126, 521–534 (2004) · Zbl 1076.62052
[25] Green, P.J.: On use of the em algorithm for penalized likelihood estimation. J. R. Stat. Soc. B 52, 443–452 (1990) · Zbl 0706.62022
[26] Gupta, A.K.: Multivariate skew-t distribution. Statistics 37, 359–363 (2003) · Zbl 1037.62045
[27] Ho, H., Lin, T., Chen, H., Wang, W.: Some results on the truncated multivariate t distribution. J. Stat. Plan. Inference 142, 25–40 (2012a) · Zbl 1229.62068
[28] Ho, H., Pyne, S., Lin, T.: Maximum likelihood inference for mixtures of skew student-t-normal distributions through practical em-type algorithms. Stat. Comput. 22, 287–299 (2012b) · Zbl 1322.62087
[29] Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)
[30] Karlis, D., Xekalaki, E.: Choosing initial values for the em algorithm for finite mixtures. Comput. Stat. Data Anal. 41, 577–590 (2003) · Zbl 1429.62082
[31] Kotz, S., Nadarajah, S.: Multivariate t Distributions and Their Applications. Cambridge University Press, Cambridge (2004) · Zbl 1100.62059
[32] Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew normal independent linear mixed models. Stat. Sin. 20, 303–322 (2010) · Zbl 1186.62071
[33] Lee, S., McLachlan, G.: On the fitting of mixtures of multivariate skew t-distributions via the em algorithm (2011). arXiv:1109.4706 [statME]
[34] Lin, T.I.: Maximum likelihood estimation for multivariate skew-normal mixture models. J. Multivar. Anal. 100, 257–265 (2009) · Zbl 1152.62034
[35] Lin, T.I.: Robust mixture modeling using multivariate skew t distribution. Stat. Comput. 20, 343–356 (2010)
[36] Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew-t distribution. Stat. Comput. 17, 81–92 (2007a)
[37] Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b) · Zbl 1133.62012
[38] Lindsay, B.G.: Mixture Models: Theory, Geometry, and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5. Institute of Mathematical Statistics, Hayward (1995) · Zbl 1163.62326
[39] Liseo, B., Loperfido, N.: A bayesian interpretation of the multivariate skew-normal distribution. Stat. Probab. Lett. 61, 395–401 (2003) · Zbl 1101.62342
[40] Liu, C., Rubin, D.: The ecme algorithm: a simple extension of the em and ecm with faster monotone convergence. Biometrika 81, 633–648 (1994) · Zbl 0812.62028
[41] Maier, L.M., Anderson, D.E., De Jager, P.L., Wicker, L., Hafler, D.A.: Allelic variant in ctla4 alters t cell phosphorylation patterns. Proc. Natl. Acad. Sci. USA 104, 18607–18612 (2007)
[42] McLachlan, G., Peel, D.: Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin, A., Dori, D., Pudil, P., Freeman, H. (eds.) Lecture Notes in Computer Science, vol. 1451, pp. 658–666. Springer, Berlin (1998)
[43] McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications. Dekker, New York (1988) · Zbl 0697.62050
[44] McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics (2000) · Zbl 0963.62061
[45] O’Hagan, A.: Bayes estimation of a convex quadratic. Biometrika 60, 565–571 (1973) · Zbl 0277.62052
[46] O’Hagan, A.: Moments of the truncated multivariate-t distribution (1976). http://www.tonyohagan.co.uk/academic/pdf/trunc_multi_t.PDF
[47] O’Hagan, A., Murphy, T., Gormley, I.: Computational aspects of fitting mixture models via the expectation-maximization algorithm. Comput. Stat. Data Anal. 56, 3843–3864 (2012) · Zbl 1255.62180
[48] Peel, D., McLachlan, G.: Robust mixture modelling using the t distribution. Stat. Comput. 10, 339–348 (2000)
[49] Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)
[50] Sahu, S., Dey, D., Branco, M.: A new class of multivariate skew distributions with applications to bayesian regression models. Can. J. Stat. 31, 129–150 (2003). Eratum: Can. J. Stat. 37, 301–302 (2009) · Zbl 1039.62047
[51] Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978) · Zbl 0379.62005
[52] Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985) · Zbl 0646.62013
[53] Vrbik, I., McNicholas, P.: Analytic calculations for the em algorithm for multivariate skew t-mixture models. Stat. Probab. Lett. 82, 1169–1174 (2012) · Zbl 1244.65012
[54] Wang, K.: EMMIX-skew: EM algorithm for mixture of multivariate skew normal/t distributions (2009). http://www.maths.uq.edu.au/gjm/mix_soft/EMMIX-skew , R package version 1.0-12
[55] Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew t mixture models: applications: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Botema, M., Lovell, B., Maoder, A. (eds.) DICTA 2009 (Conference of Digital Image Computing: Techniques and Applications, Melbourne), pp. 526–531. IEEE Comput. Soc., Los Alamitos (2009)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.