×

Sparse optimal discriminant clustering. (English) Zbl 1505.62420

Summary: In this manuscript, we reinvestigate an existing clustering procedure, optimal discriminant clustering (ODC) [Z. Zhang and G. Dai, “Optimal scoring for unsupervised learning”, in: Proceedings of the 22nd international conference on neural information processing systems, NIPS 2009. New York, NY: Association for Computing Machinery (ACM). 2241–2249 (2009; doi:10.5555/2984093.2984344)], and propose to use cross-validation to select the tuning parameter. Furthermore, because in high-dimensional data many of the features may be non-informative for clustering, we develop a variation of ODC, sparse optimal discriminant clustering (SODC), by adding a group-lasso type of penalty to ODC. We also demonstrate that both ODC and SDOC can be used as a dimension reduction tool for data visualization in cluster analysis.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ben-David, S., Von Luxburg, U., Pal, D.: A sober look at clustering stability. 19th Annual Conference on Learning Theory (COLT 2006) 4005, 5-19 (2006) · Zbl 1143.68520
[2] Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. Pac. Symp. Biocomput. 7, 6-17 (2002)
[3] Bouveyron, C., Brunet, C.: Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat. Comput. 22(1), 301-324 (2012) · Zbl 1322.62162 · doi:10.1007/s11222-011-9249-9
[4] Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Simul. Comput. 3(1), 1-27 (1974) · Zbl 0273.62010 · doi:10.1080/03610917408548446
[5] Cattell, R.B.: The scree test for the number of factors. Multivar. Behav. Res. 1(2), 245-276 (1966) · doi:10.1207/s15327906mbr0102_10
[6] Chang, W.: On using principal components before separating a mixture of two multivaiate normal distributions. Appl. Stat. 32(3), 267-275 (1998) · Zbl 0538.62050 · doi:10.2307/2347949
[7] Clemmensen, L., Hastie, T., Witten, D.M., Ersboll, B.: Sparse discriminant analysis. Technometrics 53(4), 406-413 (2011) · doi:10.1198/TECH.2011.08118
[8] Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37-46 (1960) · doi:10.1177/001316446002000104
[9] De la Torre, F., Kanade, T.: Discriminative cluster analysis. In: The 23rd International Conference on Machine Learning, pp. 241-248 (2006)
[10] Fang, Y., Wang, J.: Selection of the number of clusters via the bootstrap method. Comput. Stat. Data Anal. 56(3), 468-477 (2012) · Zbl 1239.62076 · doi:10.1016/j.csda.2011.09.003
[11] Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553-584 (1983) · Zbl 0545.62042 · doi:10.1080/01621459.1983.10478008
[12] Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes (with discussion). J. R. Stat. Soc. Ser. B 66(4), 815-849 (2004) · Zbl 1060.62064 · doi:10.1111/j.1467-9868.2004.02059.x
[13] Friedman, J.H., Tukey, J.W.: A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. C-23(9), 881-890 (1974) · Zbl 0284.68079 · doi:10.1109/T-C.1974.224051
[14] Gnanadesikan, R.: Methods for Statistical Data Analysis of Multivariate Observations, 2nd edn. Wiley, New York (1997) · Zbl 0403.62034 · doi:10.1002/9781118032671
[15] Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975) · Zbl 0372.62040
[16] Hastie, T., Tibshirani, R., Buja, A.: Flexible discriminant analysis by optimal scoring. J. Am. Stat. Assoc. 89, 1255-1270 (1994) · Zbl 0812.62067 · doi:10.1080/01621459.1994.10476866
[17] Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd edn. Springer, New York (2009) · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[18] Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 2, 241-254 (1967) · Zbl 1367.62191 · doi:10.1007/BF02289588
[19] Jones, M.C., Sibson, R.: What is projection pursuit? J. R. Stat. Soc. Ser. A 150(1), 1-37 (1987) · Zbl 0632.62059 · doi:10.2307/2981662
[20] Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluter Analysis. Wiley, New York (1990) · Zbl 1345.62009 · doi:10.1002/9780470316801
[21] Krzanowski, W.J., Lai, Y.T.: A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics 44(1), 23-34 (1988) · Zbl 0707.62122 · doi:10.2307/2531893
[22] Lange, T., Roth, V., Braun, M., Buhmann, J.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299-1323 (2004) · Zbl 1089.68100 · doi:10.1162/089976604773717621
[23] MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability 1, 281-297 (1967) · Zbl 0214.46201
[24] Maugis, C., Celeux, G., Martin-Magniette, M.L.: Variable selection in model-based clustering: a general variable role modeling. Comput. Stat. Data Anal. 53(11), 3872-3882 (2009) · Zbl 1453.62154 · doi:10.1016/j.csda.2009.04.013
[25] Melnykov, V., Chen, W.-C., Maitra, R.: MixSim: an R package for simulating data to study performance of clustering algorithms. J. Stat. Softw. 51(12), 1-25 (2012) · doi:10.18637/jss.v051.i12
[26] Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14, 849-856 (2001)
[27] Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101(473), 168-178 (2006) · Zbl 1118.62339 · doi:10.1198/016214506000000113
[28] Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. (Am. Stat. Assoc.) 66(336), 846-850 (1971) · doi:10.1080/01621459.1971.10482356
[29] Rocci, R., Gattone, S.F., Vichi, M.: A new dimension reduction method: factor discriminant K-means. J. Classif. 28, 210-226 (2011) · Zbl 1226.62062 · doi:10.1007/s00357-011-9085-9
[30] Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888-905 (2000) · doi:10.1109/34.868688
[31] Steinley, D., Brusco, M.J.: A new variable weighting and selection procedure for K-means cluster analysis. Multivar. Behav. Res. 43(1), 77-108 (2008) · doi:10.1080/00273170701836695
[32] Sugar, C., James, G.: Finding the number of clusters in a data set: an imformation theoretic approach. J. Am. Stat. Assoc. 98(463), 750-763 (2003) · Zbl 1046.62064 · doi:10.1198/016214503000000666
[33] Sun, L., Ji, S., Ye, J.: A least squares formulation for canonical correlation analysis. In: The 25th International Conference Machine Learning, pp. 1024-1031 (2008)
[34] Sun, W., Wang, J., Fang, Y.: Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electron. J. Stat. 6, 148-167 (2012) · Zbl 1335.62109 · doi:10.1214/12-EJS668
[35] Sun, W., Wang, J., Fang, Y.: Consistent selection of tuning parameters via variable selection stability. J. Mach. Learn. Res. 14, 3419-3440 (2013) · Zbl 1318.62241
[36] Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B 63(2), 411-423 (2001) · Zbl 0979.62046 · doi:10.1111/1467-9868.00293
[37] Tyler, D.E., Critchley, F., Dümbgen, L., Oja, H.: Invariant co-ordinate selection (with discussion). J. R. Stat. Soc. Ser. B 71(3), 549-592 (2009) · Zbl 1250.62032 · doi:10.1111/j.1467-9868.2009.00706.x
[38] Wang, J.: Consistent selection of the number of clusters via crossvalidation. Biometrika 97(4), 893-904 (2010) · Zbl 1204.62104 · doi:10.1093/biomet/asq061
[39] Witten, D.M., Tibshirani, R.: A framework for feature selection in clustering. J. Am. Stat. Assoc. 105(490), 713-726 (2010) · Zbl 1392.62194 · doi:10.1198/jasa.2010.tm09415
[40] Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68(1), 49-67 (2006) · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[41] Zhang, Z., Dai, G.: Optimal scoring for unsupervised learning. Adv. Neural Inf. Process. Syst. 23(12), 2241-2249 (2009)
[42] Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301-320 (2005) · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
[43] Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265-286 (2006) · doi:10.1198/106186006X113430
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.