Modelling high-dimensional data by mixtures of factor analyzers.(English)Zbl 1256.62036

Summary: We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of $$n$$ data points of dimension $$p$$, where $$p$$ is large relative to $$n$$. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments.

MSC:

 62H25 Factor analysis and principal components; correspondence analysis 62G07 Density estimation 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62P10 Applications of statistics to biology and medical sciences; meta analysis 92C40 Biochemistry, molecular biology
Full Text:

References:

 [1] Aitkin, M.; Anderson, D.; Hinde, J., Statistical modelling of data on teaching styles, J. roy. statist. soc. ser. B, 144, 419-461, (1981), with discussion [2] Alon, U.; Barkai, N.; Notterman, D.A.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A.J., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. nat. acad. sci., 96, 6745-6750, (1999) [3] Banfield, J.D.; Raftery, A.E., Model-based gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034 [4] Bishop, C.M., Latent variable models, (), 371-403 · Zbl 0948.62043 [5] Chang, W.C., On using principal components before separating a mixture of two multivariate normal distributions, Appl. statist., 32, 267-275, (1983) · Zbl 0538.62050 [6] Dempster, A.P.; Laird, N.M.; Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm, J. roy. statist. soc. ser. B, 39, 1-38, (1977), with discussion · Zbl 0364.62022 [7] Fokoué, E., Titterington, D.M., 2000. Bayesian sampling for mixtures of factor analysers. Technical Report, Department of Statistics, University of Glasgow, Glasgow. [8] Getz, G., 2001. Private communication. [9] Getz, G.; Levine, E.; Domany, E., Coupled two-way clustering analysis of gene microarray data, Cell biol., 97, 12079-12084, (2000) [10] Ghahramani, Z.; Beal, M.J., Variational inference for Bayesian mixtures of factor analyzers, (), 449-455 [11] Ghahramani, Z., Hinton, G.E., 1997. The EM algorithm for factor analyzers. Technical Report No. CRG-TR-96-1, The University of Toronto, Toronto. [12] Hinton, G.E.; Dayan, P.; Revow, M., Modeling the manifolds of images of handwritten digits, IEEE trans. neural networks, 8, 65-73, (1997) [13] Lawley, D.N.; Maxwell, A.E., Factor analysis as a statistical method, (1971), Butterworths London · Zbl 0251.62042 [14] Li, J.Q., Barron, A.R., 2000. Mixture density estimation. Technical Report, Department of Statistics, Yale University, New Haven, Connecticut. [15] McLachlan, G.J.; Krishnan, T., The EM algorithm and extensions, (1997), Wiley New York · Zbl 0882.62012 [16] McLachlan, G.J.; Peel, D., Finite mixture models, (2000), Wiley New York · Zbl 0963.62061 [17] McLachlan, G.J.; Peel, D., Mixtures of factor analyzers, (), 599-606 [18] McLachlan, G.J., Bean, R.W., Peel, D., 2001. EMMIX-GENE: a mixture model-based program for the clustering of microarray expression data. Technical Report, Centre for Statistics, University of Queensland. [19] Meng, X.L.; Rubin, D.B., Maximum likelihood estimation via the ECM algorithma general framework, Biometrika, 80, 267-278, (1993) · Zbl 0778.62022 [20] Meng, X.L.; van Dyk, D., The EM algorithm—an old folk song sung to a fast new tune, J. roy. statist. soc. ser. B, 59, 511-567, (1997), with discussion · Zbl 1090.62518 [21] Schwarz, G., Estimating the dimension of a model, Ann. statist., 6, 461-464, (1978) · Zbl 0379.62005 [22] Tipping, M.E., Bishop, C.M., 1997. Mixtures of probabilistic principal component analysers. Technical Report No. NCRG/97/003, Neural Computing Research Group, Aston University, Birmingham. [23] Tipping, M.E.; Bishop, C.M., Mixtures of probabilistic principal component analysers, Neural comput., 11, 443-482, (1999) [24] Utsugi, A.; Kumagai, T., Bayesian analysis of mixtures of factor analyzers, Neural comput., 13, 993-1002, (2001) · Zbl 1042.62028
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.