zbMATH — the first resource for mathematics

Cross-entropy clustering. (English) Zbl 1342.68279
Summary: We build a general and easily applicable clustering theory, which we call cross-entropy clustering (shortly CEC), which joins the advantages of classical k-means (easy implementation and speed) with those of EM (affine invariance and ability to adapt to clusters of desired shapes). Moreover, contrary to k-means and EM, CEC finds the optimal number of clusters by automatically removing groups which have negative information cost.
Although CEC, like EM, can be built on an arbitrary family of densities, in the most important case of Gaussian CEC the division into clusters is affine invariant.

68T10 Pattern recognition, speech recognition
62H30 Classification and discrimination; cluster analysis (statistical aspects)
k-means++; EDA
Full Text: DOI arXiv
[1] Hartigan, J., Clustering algorithms, (1975), John Wiley and Sons, New York, NY, USA · Zbl 0321.62069
[2] Jain, A.; Dubes, R., Algorithms for clustering data, (1988), Prentice-Hall, Inc., Upper Saddle River, NJ, USA · Zbl 0665.62061
[3] Jain, A.; Murty, M.; Flynn, P., Data clusteringa review, ACM Comput. Surv., 31, 264-323, (1999)
[4] Jain, A., Data clustering50 years beyond K-means, Pattern Recognit. Lett., 31, 651-666, (2010)
[5] Xu, R.; Wunsch, D., Clustering, (2009), Wiley-IEEE Press
[6] H. Bock, Clustering Methods: A History of K-Means Algorithms, Selected Contributions in Data Analysis and Classification, 2007, pp. 161-172. · Zbl 1181.68229
[7] Hans-Hermann, Bock, Origins and extensions of the k-means algorithm in cluster analysis, Electron. J. Hist. Probab. Stat., 4, (2008) · Zbl 1175.01030
[8] Tibshirani, R.; Walther, G.; Hastie, T., Estimating the number of clusters in a data set via the gap statistic, J. R. Stat. Soc.: Ser. B (Stat. Methodol.), 63, 411-423, (2001) · Zbl 0979.62046
[9] Mirkin, B., Choosing the number of clusters, Wiley Interdiscip. Rev.: Data Min. Knowl. Discov., 1, 252-260, (2011)
[10] V. Estivill-Castro, J. Yang, Fast and robust general purpose clustering algorithms, in: PRICAI 2000 Topics in Artificial Intelligence, vol. 20, 2000, pp. 208-218.
[11] Massa, S.; Paolucci, M.; Puliafito, P., A new modeling technique based on Markov chains to mine behavioral patterns in event based time series, Data Wareh. Knowl. Discov., 802, (1999)
[12] McLachlan, G.; Krishnan, T., The EM Algorithm and Extensions, vol. 274, (1997), Wiley New York · Zbl 0882.62012
[13] Samé, A.; Ambroise, C.; Govaert, G., An online classification EM algorithm based on the mixture model, Stat. Comput., 17, 209-218, (2007)
[14] Cover, T.; Thomas, J.; Wiley, J., Elements of Information Theory, vol. 6, (1991), Wiley Online Library, Hoboken, New Jersey
[15] MacKay, D., Information theory, inference, and learning algorithms, (2003), Cambridge University Press, New York, NY, USA · Zbl 1055.94001
[16] Kullback, S., Information theory and statistics, (1997), Dover Pubns, New York, NY, Dover · Zbl 0149.37901
[17] Shannon, C., A mathematical theory of communication, ACM SIGMOBILE Mob. Comput. Commun. Rev., 5, 3-55, (2001)
[18] Grünwald, P., The minimum description length principle, (2007), The MIT Press, Cambridge, MA, USA
[19] Grünwald, P.; Myung, I.; Pitt, M., Advances in minimum description lengththeory and applications, (2005), The MIT Press
[20] Ma, Y.; Derksen, H.; Hong, W.; Wright, J., Segmentation of multivariate mixed data via lossy data coding and compression, IEEE Trans. Pattern Anal. Mach. Intell., 29, 1546-1562, (2007)
[21] Yang, A.; Wright, J.; Ma, Y.; Sastry, S., Unsupervised segmentation of natural images via lossy data compression, Comput. Vis. Image Underst., 110, 212-225, (2008)
[22] B. Kulis, M.I. Jordan, Revisiting k-means: new algorithms via Bayesian nonparametrics, in: Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, UK, 2012, pp. 513-520.
[23] Kurihara, K.; Welling, M., Bayesian k-means as a maximization-expectation algorithm, Neural Comput., 21, 1145-1172, (2009) · Zbl 1178.68425
[24] Kriegel, H.-P.; Kröger, P.; Sander, J.; Zimek, A., Density-based clustering, Wiley Interdiscip. Rev.: Data Min. Knowl. Discov., 1, 231-240, (2011)
[25] Rose, K.; Gurewitz, E.; Fox, G., A deterministic annealing approach to clustering, Pattern Recognit. Lett., 11, 589-594, (1990) · Zbl 0800.68817
[26] Halkidi, M.; Batistakis, Y.; Vazirgiannis, M., On clustering validation techniques, J. Intell. Inf. Syst., 17, 107-145, (2001) · Zbl 0998.68154
[27] J. Goldberger, S.T. Roweis, Hierarchical clustering of a mixture model, in: Advances in Neural Information Processing Systems, pp. 505-512.
[28] Zhang, B.; Zhang, C.; Yi, X., Competitive EM algorithm for finite mixture models, Pattern Recognit., 37, 131-144, (2004) · Zbl 1067.68603
[29] Figueiredo, M. A.T.; Jain, A. K., Unsupervised learning of finite mixture models, IEEE Trans. Pattern Anal. Mach. Intell., 24, 381-396, (2002)
[30] R.S. Wallace, T. Kanade, Finding natural clusters having minimum description length, in: Proceedings of 10th International Conference on Pattern Recognition, Atlantic City, NJ, vol. 1, 1990, IEEE, pp. 438-442.
[31] Ben-Hur, A.; Horn, D.; Siegelmann, H. T.; Vapnik, V., Support vector clustering, J. Mach. Learn. Res., 2, 125-137, (2002) · Zbl 1002.68598
[32] Härdle, W.; Simar, L., Applied multivariate statistical analysis, (2012), Berlin, Heidelberg : Springer-Verlag Berlin Heidelberg · Zbl 1266.62032
[33] S. Dasgupta, Learning mixtures of Gaussians, in: 40th Annual Symposium on Foundations of Computer Science, 1999, IEEE, pp. 634-644.
[34] D. Arthur, S. Vassilvitskii, k-means++: the advantages of careful seeding, in: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics Philadelphia, PA, USA, pp. 1027-1035. · Zbl 1302.68273
[35] Otsu, N., A threshold selection method from gray-level histogram, IEEE Trans. Syst. Man Cybern., 9, 62-66, (1979)
[36] B. Gatos, K. Ntirogiannis, I. Pratikakis, ICDAR 2009 document image binarization contest (DIBCO 2009): [https://mail.google.com/mail/u/0/images/cleardot.gif], Barcelona, in: 10th International Conference on Document Analysis and Recognition, ICDAR׳09, 2009, IEEE, pp. 1375-1382.
[37] M. Śmieja, J. Tabor, Image segmentation with use of cross-entropy clustering, in: Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, Springer, Springer-Verlag Berlin Heidelberg, pp. 403-409.
[38] P. Spurek, J. Tabor, E. Zaja¸c, Detection of disk-like particles in electron microscopy images, in: Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, Springer, Springer-Verlag Berlin Heidelberg, pp. 411-417.
[39] J. Tabor, K. Misztal, Detection of elliptical shapes via cross-entropy clustering, in: Pattern Recognition and Image Analysis, Springer, Springer-Verlag Berlin Heidelberg, 2013, pp. 656-663.
[40] Banfield, J. D.; Raftery, A. E., Model-based gaussian and non-Gaussian clustering, Biometrics, 803-821, (1993) · Zbl 0794.62034
[41] Martinez, W. L.; Martinez, A. R., Exploratory data analysis with MATLAB, (2005), CRC Press, Chapman & Hall/CRC · Zbl 1067.62005
[42] Davis-Stober, C.; Broomell, S.; Lorenz, F., Exploratory data analysis with MATLAB, Psychometrika, 72, 107-108, (2007)
[43] De Maesschalck, R.; Jouan-Rimbaud, D.; Massart, D., The Mahalanobis distance, Chemometr. Intell. Lab. Syst., 50, 1-18, (2000)
[44] Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P., NP-hardness of Euclidean sum-of-squares clustering, Mach. Learn., 75, 245-248, (2009) · Zbl 1378.68047
[45] Lloyd, S., Least squares quantization in PCM, IEEE Trans. Inf. Theory, 28, 129-137, (1982) · Zbl 0504.94015
[46] J. MacQueen, et al., Some methods for classification and analysis of multivariate observations, in: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, CA, USA, p. 14. · Zbl 0214.46201
[47] Telgarsky, M.; Vattani, A., Hartigans methodk-means clustering without Voronoi, J. Mach. Learn. Res.—Proc. Track, 9, 820-827, (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.