×

Location and scale mixtures of Gaussians with flexible tail behaviour: properties, inference and application to multivariate clustering. (English) Zbl 1468.62210

Summary: The family of location and scale mixtures of Gaussians has the ability to generate a number of flexible distributional forms. The family nests as particular cases several important asymmetric distributions like the Generalized Hyperbolic distribution. The Generalized Hyperbolic distribution in turn nests many other well known distributions such as the Normal Inverse Gaussian. In a multivariate setting, an extension of the standard location and scale mixture concept is proposed into a so called multiple scaled framework which has the advantage of allowing different tail and skewness behaviours in each dimension with arbitrary correlation between dimensions. Estimation of the parameters is provided via an EM algorithm and extended to cover the case of mixtures of such multiple scaled distributions for application to clustering. Assessments on simulated and real data confirm the gain in degrees of freedom and flexibility in modelling data of varying tail behaviour and directional shape.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H05 Characterization and structure theory for multivariate probability distributions; copulas
62H10 Multivariate distribution of statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI HAL

References:

[1] Aas, K.; Hobaek Haff, I., The generalised hyperbolic skew student’s \(t\)-distribution, J. Financ. Econom., 4, 2, 275-309, (2006)
[2] Aas, K.; Hobaek Haff, I.; Dimakos, X., Risk estimation using the multivariate normal inverse Gaussian distribution, J. Risk, 8, 2, 39-60, (2005)
[3] Azzalini, A.; Capitanio, A., Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution, J. R. Stat. Soc. Ser. B, 65, 367-389, (2003) · Zbl 1065.62094
[4] Barndorff-Nielsen, O., Normal inverse Gaussian distributions and stochastic volatility modelling, Scand. J. Stat., 24, 1, 1-13, (1997) · Zbl 0934.62109
[5] Barndorff-Nielsen, O.; Kent, J.; Sorensen, M., Normal variance-mean mixtures and \(z\) distributions, Int. Stat. Rev., 50, 2, 145-149, (1982) · Zbl 0497.62019
[6] Basso, R.; Lachos, V.; Cabral, C.; Ghosh, P., Robust mixture modelling based on scale mixtures of skew-normal distributions, Comput. Statist. Data Anal., 54, 2926-2941, (2010) · Zbl 1284.62193
[7] Benaglia, T.; Chauveau, D.; Hunter, D., An EM-like algorithm for semi- and nonparametric estimation in multivariate mixtures, J. Comput. Graph. Statist., 18, 505-526, (2009)
[8] Benaglia, T.; Chauveau, D.; Hunter, D.; Young, D., Mixtools: an R package for analyzing finite mixture models, J. Stat. Softw., 32, 6, (2009)
[9] Bouveyron, C.; Girard, S.; Schmid, C., High dimensional data clustering, Comput. Statist. Data Anal., 52, 502-519, (2007) · Zbl 1452.62433
[10] Branco, M.; Dey, D., A general class of multivariate skew-elliptical distributions, J. Multivariate Anal., 79, 99-113, (2001) · Zbl 0992.62047
[11] Browne, R., McNicholas, P., 2013. A mixture of generalized hyperbolic distributions, arXiv:1305.1036. · Zbl 1320.62144
[12] Browne, R.; McNicholas, P., Orthogonal Stiefel manifold optimization for eigen-decomposed covariance parameter estimation in mixture models, Stat. Comput., 24, 2, 203-210, (2014) · Zbl 1325.62008
[13] Browne, R.; McNicholas, P., Estimating common principal components in high dimensions, Adv. Data Anal. Classification, 8, 2, 217-226, (2014)
[14] Cabral, C.; Lachos, V.; Prates, M., Multivariate mixture modelling using skew-normal independent distributions, Comput. Statist. Data Anal., 56, 126-142, (2012) · Zbl 1239.62058
[15] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognit., 28, 781-793, (1995)
[16] Chang, G.; Walther, G., Clustering with mixtures of log-concave distributions, Comput. Statist. Data Anal., 51, 6242-6251, (2007) · Zbl 1445.62141
[17] Deleforge, A.; Forbes, F.; Horaud, R., Acoustic space learning for sound-source separation and localization on binaural manifolds, Int. J. Neural Syst., 25, 1, (2015)
[18] Ferreira, J. T.A. S.; Steel, M. F.J., Model comparison of coordinate-free multivariate skewed distributions with an application to stochastic frontiers, J. Econometrics, 137, 641-673, (2007) · Zbl 1360.62258
[19] Ferreira, J. T.A. S.; Steel, M. F.J., A new class of multivariate skew distributions with applications to regression analysis, Statist. Sinica, 17, 505-529, (2007) · Zbl 1144.62035
[20] Flury, B. N., Common principal components in K groups, J. Amer. Statist. Assoc., 79, 388, 892-898, (1984)
[21] Flury, B. N.; Gautschi, W., An algorithm for simultaneous orthogonal transformation of several positive definite symmetric matrices to nearly diagonal form, SIAM J. Sci. Stat. Comput., 7, 1, 169-184, (1986) · Zbl 0614.65043
[22] Forbes, F., Doyle, S., Garcia-Lorenzo, D., Barillot, C., Dojat, M., A weighted multi-sequence Markov model for brain lesion segmentation. In: 13th International Conference on Artificial Intelligence and Statistics (AISTATS10), Sardinia, Italy, 13-15 May 2010.
[23] Forbes, F.; Wraith, D., A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering, Stat. Comput., 24, 6, 971-984, (2014) · Zbl 1332.62204
[24] Fraley, C.; Raftery, A. E., Model-based clustering, discriminant analysis, and density estimation, J. Amer. Statist. Assoc., 97, 611-631, (2002) · Zbl 1073.62545
[25] Franczak, B., Tortora, C., Browne, R., McNicholas, P., 2014. Mixtures of skewed distributions with hypercube contours, arXiv:1403.2285v4.
[26] Fruwirth-Schnatter, S., (Finite Mixture and Markov Switching Models, Springer Series in Statistics, (2006))
[27] Garcia-Escudero, L.; Gordaliza, A., Robustness properties of \(k\)-means and trimmed \(k\)-means, J. Amer. Statist. Assoc., 94, 447, 956-969, (1999) · Zbl 1072.62547
[28] Gjerde, T.; Eidsvik, J.; Nyrnes, E.; Bruun, B., Positioning and position error of petroleum wells, J. Geodetic Sci., 1, 158-169, (2011)
[29] Hunter, D.; Young, D., Semiparametric mixtures of regressions, J. Nonparametr. Stat., 24, 1, 19-38, (2012) · Zbl 1241.62055
[30] Jorgensen, B., (Statistical Properties of the Generalized Inverse Gaussian Distribution, Lecture Notes in Statistics, (1982), Springer New York) · Zbl 0486.62022
[31] Karlis, D., An EM type algorithm for maximum likelihood estimation of the normal inverse Gaussian distribution, Statist. Probab. Lett., 57, 43-52, (2002) · Zbl 0996.62015
[32] Karlis, D.; Santourian, A., Model-based clustering with non-elliptically contoured distributions, Stat. Comput., 19, 73-83, (2009)
[33] Karlis, D.; Xekalaki, E., Choosing initial values for the EM algorithm for finite mixtures, Comput. Statist. Data Anal., 41, 3-4, 577-590, (2003) · Zbl 1429.62082
[34] Kotz, S., Nadarajah, S., 2004. Multivariate \(t\) Distributions and their Applications. Cambridge. · Zbl 1100.62059
[35] Lachos, V.; Ghosh, P.; Arellano-Valle, R., Likelihood based inference for skew normal independent mixed models, Statist. Sinica, 20, 303-322, (2010) · Zbl 1186.62071
[36] Lee, S.; McLachlan, G., Emmixuskew: an R package for Fitting mixtures of multivariate skew \(t\)-distributions via the EM algorithm, J. Stat. Softw., 55, 12, (2013)
[37] Lee, S.; McLachlan, G., Model-based clustering and classification with non-normal mixture distributions (with discussion), Stat. Methods Appl., 22, 427-479, (2013) · Zbl 1332.62210
[38] Lee, S.; McLachlan, G., On mixtures of skew normal and skew \(t\)-distributions, Adv. Data Anal. Classification, 7, 241-266, (2013) · Zbl 1273.62115
[39] Lee, S., McLachlan, G., 2014a. Finite mixtures of canonical fundamental skew \(t\)-distributions. arXiv preprint arXiv:1405.0685.
[40] Lee, S.; McLachlan, G., Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results, Stat. Comput., 24, 181-202, (2014) · Zbl 1325.62107
[41] Lin, T.-I., Robust mixture modelling using multivariate skew-\(t\) distribution, Stat. Comput., 20, 343-356, (2010)
[42] Lin, T.-I., Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition, Comput. Statist. Data Anal., 71, 183-195, (2014)
[43] Lin, T.-I.; Ho, H. J.; Lee, C.-R., Flexible mixture modelling using the multivariate skew-\(t\)-normal distribution, Stat. Comput., 24, 4, 531-546, (2014) · Zbl 1325.62113
[44] Lin, T.-I.; Lee, J. C.; Ho, H. J., On fast supervised learning for normal mixture models with missing information, Pattern Recognit., 39, 6, 1177-1187, (2006) · Zbl 1096.68723
[45] Maier, L.; Anderson, D.; De Jager, P.; Wicker, L.; Hafler, D., Allelic variant in ctla4 alters \(t\) cell phosphorylation patterns, Proc. Natl. Acad. Sci. USA, 104, 18607-18612, (2007)
[46] O’Hagan, A.; Murphy, T. B.; Gormley, I. C.; McNicholas, P.; Karlis, D., Clustering with the multivariate normal inverse Gaussian distribution, Comput. Statist. Data Anal., (2014)
[47] Oigard, T.A., Hanssen, A., Hansen, R.E., 2004. The multivariate normal inverse Gaussian distribution: EM-estimation and analysis of synthetic aperture sonar data. In: XII European Signal Processing Conference, Eusipco. Vienna, Austria.
[48] Protassov, R., EM-based maximum likelihood parameter estimation for multivariate generalized hyperbolic distributions, Stat. Comput., 14, 67-77, (2004)
[49] Pyne, S.; Hu, X.; Wang, K., Automated high-dimensional flow cytometric flow analysis, Proc. Natl. Acad. Sci. USA, 106, 8519-8524, (2009)
[50] Sahu, S.; Dey, D.; Branco, M., A new class of multivariate skew distributions with applications to Bayesian regression models, Canad. J. Statist., 31, 129-150, (2003) · Zbl 1039.62047
[51] Schmidt, R.; Hrycej, T.; Stutzle, E., Multivariate distribution models with generalized hyperbolic margins, Comput. Statist. Data Anal., 50, 2065-2096, (2006) · Zbl 1445.62108
[52] Team, R. D.C., R: A language and environment for statistical computing, (2011), URL http://www.R-project.org/
[53] Tortora, C., Browne, R.P., Franczak, B.C., McNicholas, P.D., July 2014a. MixGHD: Model based clustering and classification using the mixture of generalized hyperbolic distributions, Version 1.0.
[54] Tortora, C., Franczak, B., Browne, R., McNicholas, P., 2014b. Model-based clustering using mixtures of coalesced generalized hyperbolic distributions, arXiv:1403.2332v3.
[55] Tortora, C., McNicholas, P., Browne, R., 2013. A mixture of generalized hyperbolic factor analyzers, arXiv:1311.6530.
[56] Vilca, F.; Balakrishnan, N.; Zeller, C., Multivariate skew-normal generalized hyperbolic distribution and its properties, J. Multivariate Anal., 128, 73-85, (2014) · Zbl 1352.62080
[57] Vilca, F.; Balakrishnan, N.; Zeller, C., A robust extension of the bivariate Birnbaum-Saunders distribution and associated inference, J. Multivariate Anal., 124, 418-435, (2014) · Zbl 1360.62061
[58] Wang, W., Mixtures of common-factor analyzers for modeling high-dimensional data with missing values, Comput. Statist. Data Anal., 83, 0, 223-235, (2015)
[59] Young, D.; Hunter, D., Mixtures of regressions with predictor-dependent mixing proportions, Comput. Statist. Data Anal., 54, 10, 2253-2266, (2010) · Zbl 1284.62467
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.