Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. (English) Zbl 1322.62162

Summary: Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a family of 12 parsimonious DLM models is exhibited which allows to fit onto various situations. An estimation algorithm, called the Fisher-EM algorithm, is also proposed for estimating both the mixture parameters and the discriminative subspace. Experiments on simulated and real datasets highlight the good performance of the proposed approach as compared to existing clustering methods while providing a useful representation of the clustered data. The method is as well applied to the clustering of mass spectrometry data.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P10 Applications of statistics to biology and medical sciences; meta analysis
62A09 Graphical methods in statistics


mclust; PGMM
Full Text: DOI arXiv


[1] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high-dimensional data for data mining application. In: ACM SIGMOD International Conference on Management of Data, pp. 94–105 (1998)
[2] Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974) · Zbl 0314.62039
[3] Alexandrov, T., Decker, J., Mertens, B., Deelder, A., Tollenaar, R., Maass, P., Thiele, H.: Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation. Bioinformatics 25(5), 643–649 (2009) · Zbl 05743792
[4] Anderson, E.: The irises of the Gaspé Peninsula. Bull. Am. Iris Soc. 59, 2–5 (1935)
[5] Baek, J., McLachlan, G., Flack, L.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1298–1309 (2010)
[6] Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957) · Zbl 0077.13605
[7] Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)
[8] Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003) · Zbl 1429.62235
[9] Bishop, C., Svensen, M.: The generative topographic mapping. Neural Comput. 10(1), 215–234 (1998)
[10] Boutemedjet, S., Bouguila, N., Ziou, D.: A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering. IEEE Trans. PAMI 31(8), 1429–1443 (2009)
[11] Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52(1), 502–519 (2007) · Zbl 1452.62433
[12] Campbell, N.: Canonical variate analysis: a general model formulation. Aust. J. Stat. 28, 86–96 (1984) · Zbl 0558.62003
[13] Celeux, G., Diebolt, J.: The SEM algorithm: a probabilistic teacher algorithm from the EM algorithm for the mixture problem. Comput. Stat. Q. 2(1), 73–92 (1985)
[14] Celeux, G., Govaert, G.: A classification E.M. algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14, 315–332 (1992) · Zbl 0937.62605
[15] Clausi, D.A.: K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation. Pattern Recognit. 35, 1959–1972 (2002) · Zbl 1006.68867
[16] Ding, C., Li, T.: Adaptative dimension reduction using discriminant analysis and k-means clustering. In: ICML (2007)
[17] Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2000)
[18] Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
[19] Foley, D., Sammon, J.: An optimal set of discriminant vectors. IEEE Trans. Comput. 24, 281–289 (1975) · Zbl 0296.68106
[20] Fraley, C., Raftery, A.: MCLUST: software for model-based cluster analysis. J. Classif. 16, 297–306 (1999) · Zbl 0951.91500
[21] Fraley, C., Raftery, A.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458) (2002) · Zbl 1073.62545
[22] Friedman, J.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84, 165–175 (1989)
[23] Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, San Diego (1990) · Zbl 0711.62052
[24] Golub, G., Van Loan, C.: Matrix Computations, 2nd edn. Hopkins University Press, Baltimore (1991) · Zbl 1118.65316
[25] Guo, Y.F., Li, S.J., Yang, J.Y., Shu, T.T., Wu, L.D.: A generalized Foley-Sammon transform based on generalized Fisher discriminant criterion and its application to face recognition. Pattern Recognit. Lett. 24, 147–158 (2003) · Zbl 1055.68092
[26] Hamamoto, Y., Matsuura, Y., Kanaoka, T., Tomita, S.: A note on the orthonormal discriminant vector method for feature extraction. Pattern Recognit. 24(7), 681–684 (1991)
[27] Hastie, T., Buja, A., Tibshirani, R.: Penalized discriminant analysis. Ann. Stat. 23, 73–102 (1995) · Zbl 0821.62031
[28] Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009) · Zbl 1273.62005
[29] Howland, P., Park, H.: Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 26(8), 995–1006 (2004) · Zbl 05112007
[30] Jain, A., Marty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
[31] Jin, Z., Yang, J., Hu, Z., Lou, Z.: Face recognition based on the uncorrelated optimal discriminant vectors. Pattern Recognit. 10(34), 2041–2047 (2001) · Zbl 0999.68189
[32] Jolliffe, I.: Principal Component Analysis. Springer, New York (1986) · Zbl 0584.62009
[33] Kimeldorf, G., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33(1), 82–95 (1971) · Zbl 0201.39702
[34] Krzanowski, W.: Principles of Multivariate Analysis. Oxford University Press, Oxford (2003) · Zbl 1116.62058
[35] la Torre Frade, F.D., Kanade, T.: Discriminative cluster analysis. In: ICML, pp. 241–248 (2006)
[36] Law, M., Figueiredo, M., Jain, A.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. PAMI 26(9), 1154–1166 (2004) · Zbl 05112235
[37] Liu, K., Cheng, Y.Q., Yang, J.Y.: A generalized optimal set of discriminant vectors. Pattern Recognit. 25(7), 731–739 (1992)
[38] Maugis, C., Celeux, G., Martin-Magniette, M.L.: Variable selection for clustering with Gaussian mixture models. Biometrics 65(3), 701–709 (2009) · Zbl 1172.62021
[39] McLachlan, G., Krishnan, T.: The EM algorithm and extensions. Wiley, New York (1997) · Zbl 0882.62012
[40] McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000) · Zbl 0963.62061
[41] McLachlan, G., Peel, D., Bean, R.: Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat. Data Anal. 41, 379 (2003) · Zbl 1256.62036
[42] McNicholas, P., Murphy, B.: Parsimonious Gaussian mixture models. Stat. Comput. 18(3), 285–296 (2008)
[43] Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Model. 10(4), 441–460 (2010) · Zbl 1283.62125
[44] Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl. 6(1), 69–76 (1998)
[45] Raftery, A., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101(473), 168–178 (2006) · Zbl 1118.62339
[46] Rubin, D., Thayer, D.: EM algorithms for ML factor analysis. Psychometrika 47(1), 69–76 (1982) · Zbl 0483.62046
[47] Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978) · Zbl 0379.62005
[48] Scott, D., Thompson, J.: Probability density estimation in higher dimensions. In: Fifteenth Symposium in the Interface, pp. 173–179. (1983)
[49] Tipping, E., Bishop, C.: Mixtures of probabilistic principal component analysers. Neural Comput. 11(2), 443–482 (1999)
[50] Trendafilov, N., Jolliffe, I.T.: DALASS: variable selection in discriminant analysis via the LASSO. Comput. Stat. Data Anal. 51, 3718–3736 (2007) · Zbl 1161.62379
[51] Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: IWANN (2005)
[52] Ye, J.: Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J. Mach. Learn. Res. 6, 483–502 (2005) · Zbl 1222.62081
[53] Ye, J., Zhao, Z., Wu, M.: Discriminative k-means for clustering. Adv. Neural Inf. Process. Syst. 20, 1649–1656 (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.