Mixtures of modified $$t$$-factor analyzers for model-based clustering, classification, and discriminant analysis.(English)Zbl 1204.62098

Summary: A novel family of mixture models is introduced based on modified $$t$$-factor analyzers. Modified factor analyzers were recently introduced within the Gaussian context and our work presents a more flexible and robust alternative. We introduce a family of mixtures of modified $$t$$-factor analyzers that uses this generalized version of the factor analysis covariance structure. We apply this family within three paradigms: model-based clustering; model-based classification; and model-based discriminant analysis. In addition, we apply a recently published Gaussian analogue to this family [P.D. McNicholas and T.B. Murphy, Stat. Comput. 18, 285–296 (2008)] under model-based classification and discriminant analysis paradigms for the first time. Parameter estimation is carried out within the alternating expectation-conditional maximization framework and the Bayesian information criterion is used for model selection. Two real data sets are used to compare our approach to other popular model-based approaches; in these comparisons, the chosen mixtures of modified $$t$$-factor analyzers models perform favourably. We conclude with a summary and suggestions for future work.

MSC:

 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62H25 Factor analysis and principal components; correspondence analysis

R; PGMM
Full Text:

References:

 [1] Aitken, A.C., On Bernoulli’s numerical solution of algebraic equations, Proceedings of the royal society of Edinburgh, 46, 289-305, (1926) · JFM 52.0098.05 [2] Anderson, E., The irises of the gaspe peninsula, Bulletin of the American iris society, 59, 2-5, (1935) [3] Andrews, J.L., McNicholas, P.D., to appear. Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, doi:10.1007/s11222-010-9175-2. · Zbl 1255.62171 [4] Andrews, J.L.; McNicholas, P.D.; Subedi, S., Model-based classification via mixtures of multivariate t-distributions, Computational statistics and data analysis, 55, 1, 520-529, (2011) · Zbl 1247.62151 [5] Banfield, J.D.; Raftery, A.E., Model-based gaussian and non-Gaussian clustering, Biometrics, 49, 3, 803-821, (1993) · Zbl 0794.62034 [6] Besag, J.; Green, P.; Higdon, D.; Mengersen, K., Bayesian computation and stochastic systems, Statistical science, 10, 1, 3-41, (1995) · Zbl 0955.62552 [7] Böhning, D.; Dietz, E.; Schaub, R.; Schlattmann, P.; Lindsay, B., The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family, Annals of the institute of statistical mathematics, 46, 373-388, (1994) · Zbl 0802.62017 [8] Bouveyron, C.; Girard, S.; Schmid, C., High-dimensional data clustering, Computational statistics and data analysis, 52, 1, 502-519, (2007) · Zbl 1452.62433 [9] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern recognition, 28, 781-793, (1995) [10] Dasgupta, A.; Raftery, A.E., Detecting features in spatial point processes with clutter via model-based clustering, Journal of the American statistical association, 93, 94-302, (1998) · Zbl 0906.62105 [11] Dean, N.; Murphy, T.B.; Downey, G., Using unlabelled data to update classification rules with applications in food authenticity studies, Journal of the royal statistical society series C, 55, 1, 1-14, (2006) · Zbl 1490.62155 [12] Dempster, A.P.; Laird, N.M.; Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society series B, 39, 1, 1-38, (1977) · Zbl 0364.62022 [13] Fisher, R.A., The use of multiple measurements in taxonomic problems, Annals of eugenics, 7, 2, 179-188, (1936) [14] Forina, M.; Armanino, C.; Castino, M.; Ubigli, M., Multivariate data analysis as a discriminating method of the origin of wines, Vitis, 25, 189-201, (1986) [15] Fraley, C.; Raftery, A.E., Model-based clustering, discriminant analysis, and density estimation, Journal of the American statistical association, 97, 458, 611-631, (2002) · Zbl 1073.62545 [16] Fraley, C., Raftery, A.E., 2006. MCLUSTversion 3 for R: normal mixture modeling and model-based clustering, Technical Report 504, Department of Statistics, University of Washington. Minor revisions January 2007 and November 2007. [17] Ghahramani, Z., Hinton, G.E., 1997. The EM algorithm for factor analyzers, Technical Report CRG-TR-96-1, University Of Toronto, Toronto. [18] Hubert, L.; Arabie, P., Comparing partitions, Journal of classification, 2, 193-218, (1985) [19] Karlis, D.; Meligkotsidou, L., Finite mixtures of multivariate Poisson distributions with application, Journal of statistical planning and inference, 137, 6, 1942-1960, (2007) · Zbl 1116.60006 [20] Karlis, D.; Santourian, A., Model-based clustering with non-elliptically contoured distributions, Statistics and computing, 19, 73-83, (2009) [21] Keribin, C., Estimation consistante de l’ordre de modèles de mélange, Comptes rendus de l’académie des sciences Série I mathématique, 326, 2, 243-248, (1998) · Zbl 0954.62023 [22] Keribin, C., Consistent estimation of the order of mixture models, Sankhyā, the Indian journal of statistics series A, 62, 1, 49-66, (2000) · Zbl 1081.62516 [23] Lagrange, J.L., Méchanique analitique, (1788), Chez le Veuve Desaint Paris [24] Leroux, B.G., Consistent estimation of a mixing distribution, The annals of statistics, 20, 1350-1360, (1992) · Zbl 0763.62015 [25] Lindsay, B.G., 1995. Mixture models: theory, geometry and applications, in: ‘NSF-CBMS Regional Conference Series in Probability and Statistics’, vol. 5. Institute of Mathematical Statistics, Hayward, California. [26] McLachlan, G.J., Discriminant analysis and statistical pattern recognition, (1992), John Wiley & Sons New Jersey [27] McLachlan, G.J.; Bean, R.W.; Jones, L.B.-T., Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution, Computational statistics and data analysis, 51, 11, 5327-5338, (2007) · Zbl 1445.62053 [28] McLachlan, G.J.; Krishnan, T., The EM algorithm and extensions, (1997), Wiley New York · Zbl 0882.62012 [29] McLachlan, G.J.; Peel, D., Robust cluster analysis via mixtures of multivariate t-distributions, (), 658-666 [30] McLachlan, G.J.; Peel, D., Finite mixture models, (2000), John Wiley & Sons New York · Zbl 0963.62061 [31] McLachlan, G.J., Peel, D., 2000b. Mixtures of factor analyzers. In: Proceedings of the Seventh International Conference on Machine Learning. Morgan Kaufmann, San Francisco, pp. 599-606. [32] McNicholas, P.D., Model-based classification using latent Gaussian mixture models, Journal of statistical planning and inference, 140, 5, 1175-1181, (2010) · Zbl 1181.62095 [33] McNicholas, P.D.; Murphy, T.B., Parsimonious Gaussian mixture models, Statistics and computing, 18, 285-296, (2008) [34] McNicholas, P.D.; Murphy, T.B., Model-based clustering of longitudinal data, The Canadian journal of statistics, 38, 1, 153-168, (2010) · Zbl 1190.62120 [35] McNicholas, P.D., Murphy, T.B., 2010b. Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26. (21), 2705-2712. [36] McNicholas, P.D.; Murphy, T.B.; McDaid, A.F.; Frost, D., Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models, Computational statistics and data analysis, 54, 3, 711-723, (2010) · Zbl 1464.62131 [37] Meng, X.-L.; Rubin, D.B., Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 267-278, (1993) · Zbl 0778.62022 [38] Meng, X.-L.; van Dyk, D., The EM algorithm—an old folk song sung to a fast new tune (with discussion), Journal of the royal statistical society series B, 59, 511-567, (1997) · Zbl 1090.62518 [39] R Development Core Team 2010. R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria URL: $$\langle$$http://www.R-project.org〉. [40] Rand, W.M., Objective criteria for the evaluation of clustering methods, Journal of the American statistical association, 66, 846-850, (1971) [41] Roeder, K.; Wasserman, L., Practical Bayesian density estimation using mixtures of normals, Journal of the American statistical association, 92, 894-902, (1997) · Zbl 0889.62021 [42] Schwarz, G., Estimating the dimension of a model, The annals of statistics, 6, 31-38, (1978) [43] Spearman, C., The proof and measurement of association between two things, American journal of psychology, 15, 72-101, (1904) [44] Tipping, T.E.; Bishop, C.M., Mixtures of probabilistic principal component analysers, Neural computation, 11, 2, 443-482, (1999) [45] Tipping, T.E.; Bishop, C.M., Probabilistic principal component analysers, Journal of the royal statistical society series B, 61, 611-622, (1999) · Zbl 0924.62068 [46] Titterington, D.M.; Smith, A.F.M.; Makov, U.E., Statistical analysis of finite mixture distributions, (1985), John Wiley & Sons Chichester · Zbl 0646.62013 [47] Woodbury, M.A., Inverting modified matrices, statistical research group, memographic report no. 42, (1950), Princeton University Princeton, New Jersey
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.