×

A mixture of coalesced generalized hyperbolic distributions. (English) Zbl 1433.62172

Summary: A mixture of multiple scaled generalized hyperbolic distributions (MMSGHDs) is introduced. Then, a coalesced generalized hyperbolic distribution (CGHD) is developed by joining a generalized hyperbolic distribution with a multiple scaled generalized hyperbolic distribution. After detailing the development of the MMSGHDs, which arises via implementation of a multi-dimensional weight function, the density of the mixture of CGHDs is developed. A parameter estimation scheme is developed using the ever-expanding class of MM algorithms and the Bayesian information criterion is used for model selection. The issue of cluster convexity is examined and a special case of the MMSGHDs is developed that is guaranteed to have convex clusters. These approaches are illustrated and compared using simulated and real data. The identifiability of the MMSGHDs and the mixture of CGHDs are discussed in an appendix.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H12 Estimation in multivariate analysis

Software:

MixGHD; R; Emmixuskew; mixture; QRM
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Aitken, A.C. (1926). A series formula for the roots of algebraic and transcendental equations. Proceedings of the Royal Society of Edinburgh, 45, 14-22. · JFM 51.0096.03
[2] Altman, E. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 23(4), 589-609.
[3] Andrews, J.L., & McNicholas, P.D. (2011a). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361-373. · Zbl 1255.62171
[4] Andrews, J.L., & McNicholas, P.D. (2011b). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479-1486. · Zbl 1204.62098
[5] Andrews, J.L., & McNicholas, P. (2012). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Statistics and Computing, 22(5), 1021-1029. · Zbl 1252.62062
[6] Azzalini, A., Browne, R.P., Genton, M.G., McNicholas, P.D. (2016). On nomenclature for, and the relative merits of, two formulations of skew distributions. Statistics and Probability Letters, 110, 201-206. · Zbl 1376.60024
[7] Baek, J., & McLachlan, G.J. (2011). Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics, 27, 1269-1276.
[8] Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803-821. · Zbl 0794.62034
[9] Barndorff-Nielsen, O. (1978). Hyperbolic distributions and distributions on hyperbolae. Scandinavian Journal of Statistics, 5(3), 151-157. · Zbl 0386.60018
[10] Barndorff-Nielsen, O., Kent, J., Sørensen, M. (1982). Normal variance-mean mixtures and z distributions. International Statistical Review / Revue Internationale de Statistique, 50(2), 145-159. · Zbl 0497.62019
[11] Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46, 373-388. · Zbl 0802.62017
[12] Browne, R.P., & McNicholas, P.D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8(2), 217-226. · Zbl 1474.62183
[13] Browne, R.P., & McNicholas, P.D. (2015). A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics, 43(2), 176-198. · Zbl 1320.62144
[14] Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781-793.
[15] Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Łukasik, S., żak, S. (2010). A complete gradient clustering algorithm for features analysis of x-ray images. In Piȩtka, E., & Kawa, J. (Eds.) Information Technologies in Biomedicine, (Vol. 2 pp. 15-24). Berlin: Springer.
[16] Cook, R.D., & Weisberg, S. (1994). An Introduction to Regression Graphics. New York: Wiley. · Zbl 0925.62287
[17] Cormack, R.M. (1971). A review of classification (with discussion). Journal of the Royal Statistical Society: Series A, 34, 321-367.
[18] Debreu, G., & Koopmans, T.C. (1982). Additively decomposed quasiconvex functions. Mathematical Programming, 24(1), 1-38. · Zbl 0495.90063
[19] Demarta, S., & McNeil, A.J. (2005). The t copula and related copulas. International Statistical Review, 73(1), 111-129. · Zbl 1104.62060
[20] Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 1-38. · Zbl 0364.62022
[21] Flury, B., & Riedwyl, H. (1988). Multivariate Statistics: A Practical Approach. London: Chapman & Hall. · Zbl 0495.62057
[22] Forbes, F., & Wraith, D. (2014). A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweights: Application to robust clustering. Statistics and Computing, 24(6), 971-984. · Zbl 1332.62204
[23] Fraley, C., & Raftery, A.E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611-631. · Zbl 1073.62545
[24] Franczak, B.C., Browne, R.P., McNicholas, P.D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149-1157.
[25] Franczak, B.C., Tortora, C., Browne, R.P., McNicholas, P.D. (2015). Unsupervised learning via mixtures of skewed distributions with hypercube contours. Pattern Recognition Letters, 58(1), 69-76.
[26] Gallaugher, M.P.B., & McNicholas, P.D. (2018). Finite mixtures of skewed matrix variate distributions. Pattern Recognition, 80, 83-93.
[27] Gallaugher, M.P.B., & McNicholas, P.D. (2019a). On fractionally-supervised classification: weight selection and extension to the multivariate t-distribution. Journal of Classification 36. In press.
[28] Gallaugher, M.P.B., & McNicholas, P.D. (2019b). Three skewed matrix variate distributions. Statistics and Probability Letters, 145, 103-109. · Zbl 1414.62173
[29] Ghahramani, Z., & Hinton, G.E. (1997). The EM algorithm for factor analyzers Technical Report CRG-TR-96-1. Toronto: University Of Toronto.
[30] Gneiting, T. (1997). Normal scale mixtures and dual probability densities. Journal of Statistical Computation and Simulation, 59(4), 375-384. · Zbl 0912.62020
[31] Hennig, C. (2015). What are the true clusters? Pattern Recognition Letters, 63, 53-62. · Zbl 1026.62067
[32] Holzmann, H., Munk, A., Gneiting, T. (2006). Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33, 753-763. · Zbl 1164.62354
[33] Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193-218. · Zbl 0587.62128
[34] Hunter, D.R., & Lange, K. (2000). Quantile regression via an MM algorithm. Journal of Computational and Graphical Statistics, 9(1), 60-77.
[35] Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73-83.
[36] Kent, J.T. (1983). Identifiability of finite mixtures for directional data. The Annals of Statistics, 11, 984-988. · Zbl 0515.62018
[37] Kiers, H.A. (2002). Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Computational Statistics and Data Analysis, 41(1), 157-170. · Zbl 1018.65074
[38] Kotz, S., Kozubowski, T.J., Podgorski, K. (2001). The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance 1st edn: Burkhauser Boston. · Zbl 0977.62003
[39] Kotz, S., & Nadarajah, S. (2004). Multivariate t-distributions and their applications. Cambridge: Cambridge University Press. · Zbl 1100.62059
[40] Lee, S.X., & McLachlan, G.J. (2013a). EMMIXuskew: fitting unrestricted multivariate skew t Mixture Models. R package version 0.11-5.
[41] Lee, S.X., & McLachlan, G.J. (2013b). On mixtures of skew normal and skew t-distributions. Advances in Data Analysis and Classification, 7(3), 241-266. · Zbl 1273.62115
[42] Lee, S.X., & McLachlan, G.J. (2014). Finite mixtures of multivariate skew t-distributions: some recent and new results. Statistics and Computing, 24(2), 181-202. · Zbl 1325.62107
[43] Lin, T.I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis, 100(2), 257-265. · Zbl 1152.62034
[44] Lin, T.I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343-356.
[45] Lin, T.-I., McNicholas, P.D., Hsiu, J.H. (2014). Capturing patterns via parsimonious t mixture models. Statistics and Probability Letters, 88, 80-87. · Zbl 1369.62131
[46] Lindsay, B. (1995). Mixture models: Theory, geometry and applications. In NSF-CBMS Regional Conference Series in Probability and Statistics, Vol. 5. California: Institute of Mathematical Statistics: Hayward. · Zbl 1163.62326
[47] McLachlan, G.J., & Peel, D. (2000). Mixtures of factor analyzers. In Proceedings of the Seventh International Conference on Machine Learning (pp. 599-606). San Francisco: Morgan Kaufmann.
[48] McLachlan, G.J., Bean, R.W., Jones, L. B. -T. (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Computational Statistics and Data Analysis, 51(11), 5327-5338. · Zbl 1445.62053
[49] McLachlan, G.J., & Krishnan, T. (2008). The EM Algorithm and Extensions. New York: Wiley. · Zbl 1165.62019
[50] McNeil, A.J., Frey, R., Embrechts, P. (2005). Quantitative risk management: concepts, techniques and tools. Princeton: Princeton University Press. · Zbl 1089.91037
[51] McNicholas, P.D. (2016a). Mixture Model-Based Classification. Boca-Raton: Chapman & Hall/CRC press. · Zbl 1454.62005
[52] McNicholas, P.D. (2016b). Model-based clustering. Journal of Classification, 33 (3), 331-373. · Zbl 1364.62155
[53] McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics and Data Analysis, 54(3), 711-723. · Zbl 1464.62131
[54] McNicholas, S.M., McNicholas, P.D., Browne, R.P. (2017). A mixture of variance-gamma factor analyzers. In Ahmed, S. E. (Ed.) Big and Complex Data Analysis: Methodologies and Applications (pp. 369-385). Cham: Springer International Publishing. · Zbl 1381.62187
[55] Murray, P.M., Browne, R.B., McNicholas, P.D. (2014). Mixtures of skew-t factor analyzers. Computational Statistics and Data Analysis, 77, 326-335. · Zbl 06984029
[56] Murray, P.M., Browne, R.B., McNicholas, P.D. (2017). Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. Journal of Multivariate Analysis, 161, 141-156. · Zbl 1403.62028
[57] Niculescu, C., & Persson, L. (2006). Convex Functions and Their Applications. New York: Springer. · Zbl 1100.26002
[58] Ortega, J.M., & Rheinboldt, W.C. (1970). Iterative Solutions of Nonlinear Equations in Several Variables. New York: Academic Press. · Zbl 0241.65046
[59] Peel, D., & McLachlan, G.J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339-348.
[60] Pesevski, A., Franczak, B.C., McNicholas, P.D. (2018). Subspace clustering with the multivariate-t distribution. Pattern Recognition Letters, 112(1), 297-302.
[61] R Core Team. (2017). R: A Language and Environment for Statistical Computing Vienna. Austria: R Foundation for Statistical Computing.
[62] Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846-850.
[63] Rockafellar, R.T., & Wets, R.J.B. (2009). Variational Analysis. New York: Springer. · Zbl 0888.49001
[64] Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461-464. · Zbl 0379.62005
[65] Steane, M.A., McNicholas, P.D., Yada, R. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics - Simulation and Computation, 41(4), 510-523. · Zbl 1294.62142
[66] Steinley, D. (2004). Properties of the Hubert-Arable adjusted Rand index. Psychological methods, 9(3), 386.
[67] Tang, Y., Browne, R.P., McNicholas, P.D. (2018). Flexible clustering of high-dimensional data via mixtures of joint generalized hyperbolic distributions. Stat, 7 (1), e177.
[68] Tipping, M.E., & Bishop, C.M. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443-482.
[69] Tortora, C., Franczak, B.C., Browne, R.P., McNicholas, P.D. (2014). Mixtures of multiple scaled generalized hyperbolic distributions. arXiv:1403.2332v1.
[70] Tortora, C., Browne, R.P., Franczak, B.C., McNicholas, P.D. (2017). MixGHD: model based clustering, classification and discriminant analysis using the mixture of generalized hyperbolic distributions. R package version 2.1.
[71] Vrbik, I., & McNicholas, P.D. (2012). Analytic calculations for the EM algorithm for multivariate skew-mixture models. Statistics and Probability Letters, 82(6), 1169-1174. · Zbl 1244.65012
[72] Vrbik, I., & McNicholas, P.D. (2014). Parsimonious skew mixture models for model-based clustering and classification. Computational Statistics and Data Analysis, 71, 196-210. · Zbl 1471.62202
[73] Vrbik, I., & McNicholas, P.D. (2015). Fractionally-supervised classification. Journal of Classification, 32(3), 359-381. · Zbl 1331.62319
[74] Wei, Y., Tang, Y., McNicholas, P.D. (2019). Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data. Computational Statistics and Data Analysis, 130, 18-41. · Zbl 1469.62162
[75] Wraith, D., & Forbes, F. (2015). Clustering using skewed multivariate heavy tailed distributions with flexible tail behaviour. arXiv:http://arXiv.org/abs/1408.0711. · Zbl 1468.62210
[76] Yakowitz, S.J., & Spragins, J. (1968). On the identifiability of finite mixtures. Annals of Mathematical Statistics, 39, 209-214. · Zbl 0155.25703
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.