Semi-supervised clustering via multi-level random walk. (English) Zbl 1326.68224

Summary: A key issue of semi-supervised clustering is how to utilize the limited but informative pairwise constraints. In this paper, we propose a new graph-based constrained clustering algorithm, named SCRAWL. It is composed of two random walks with different granularities. In the lower-level random walk, SCRAWL partitions the vertices (i.e., data points) into constrained and unconstrained ones, according to whether they are in the pairwise constraints. For every constrained vertex, its influence range, or the degrees of influence it exerts on the unconstrained vertices, is encapsulated in an intermediate structure called component. The edge set between each pair of components determines the affecting scope of the pairwise constraints. In the higher-level random walk, SCRAWL enforces the pairwise constraints on the components, so that the constraint influence can be propagated to the unconstrained edges. At last, we combine the cluster membership of all the components to obtain the cluster assignment for each vertex. The promising experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our method.


68T05 Learning and adaptive systems in artificial intelligence
05C81 Random walks on graphs
62H30 Classification and discrimination; cluster analysis (statistical aspects)


BSDS; BoostCluster
Full Text: DOI


[1] Ben-Hur, A.; Horn, D.; Siegelmann, H. T.; Vapnik, V., Support vector clustering, Journal of Machine Learning Research, 2, 125-137, (2001) · Zbl 1002.68598
[2] S. Basu, A. Banerjee, R.J. Mooney, Semi-supervised clustering by seeding, in: ICML, 2002, pp. 19-26.
[3] K. Wagsta, C. Cardie, Clustering with instance-level constraints, in: ICML, 2000, pp. 1103-1110.
[4] E.P. Xing, A.Y. Ng, M.I. Jordan, S. Russell, Distance metric learning with application to clustering with side-information, in: NIPS, 2002, pp. 505-512.
[5] J.V. Davis, B. Kulis, P. Jain, S. Sra, I.S. Dhillon, Information-theoretic metric learning, in: ICML, 2007, pp. 209-216.
[6] Wu, L.; Hoi, S.; Jin, R.; Zhu, J.; Yu, N., Learning Bregman distance functions for semi-supervised clustering, IEEE Transactions on Knowledge and Data Engineering, 24, 3, 478-491, (2012)
[7] Zeng, H.; Cheung, Y.-M., Semi-supervised maximum margin clustering with pairwise constraints, IEEE Transactions on Knowledge and Data Engineering, 24, 5, 926-939, (2012)
[8] M. Bilenko, S. Basu, R. Mooney, Integrating constraints and metric learning in semi-supervised clustering, in: ICML, 2004, pp. 81-88.
[9] K. Wagsta, C. Cardie, S. Rogers, S. Schroedl, Constrained k-means clustering with background knowledge, in: ICML, 2001, pp. 577-584.
[10] D. Klein, S.D. Kamvar, C.D. Manning, From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering, in: ICML, 2002, pp. 307-314.
[11] N. Shental, A. Bar-Hillel, T. Hertz, D. Weinshall, Computing Gaussian mixture models with em using equivalence constraints, in: NIPS, 2003, pp. 465-472. · Zbl 1161.68775
[12] Y. Hong, S. Kwong, H. Xiong, Q. Ren, Genetic-guided semi-supervised clustering algorithm with instance-level constraints, in: GECCO, 2008, pp. 1381-1388.
[13] Xu, X.; Lu, L.; He, P.; Pan, Z.; Chen, L., Improving constrained clustering via swarm intelligence, Neurocomputing, 116, 317-325, (2013)
[14] Lu, Z., Semi-supervised clustering with pairwise constraintsa discriminative approach, Journal of Machine Learning Research, 2, 299-306, (2007)
[15] Yin, X.; Chen, S.; Hu, E.; Zhang, D., Semi-supervised clustering with metric learningan adaptive kernel method, Pattern Recognition, 43, 4, 1320-1333, (2010) · Zbl 1192.68623
[16] Y. Liu, R. Jin, A.K. Jain, Boostcluster: boosting clustering by pairwise constraints, in: ACM SIGKDD, 2007, pp. 450-459.
[17] Maraziotis, I. A., A semi-supervised fuzzy clustering algorithm applied to gene expression data, Pattern Recognition, 45, 1, 637-648, (2012) · Zbl 1225.68200
[18] Filippone, M.; Camastra, F.; Masulli, F.; Rovetta, S., A survey of kernel and spectral methods for clustering, Pattern Recognition, 41, 1, 176-190, (2008) · Zbl 1122.68530
[19] S. Kamvar, D. Klein, C. Manning, Spectral learning, in: IJCAI, 2003, pp. 561-566.
[20] Kulis, B.; Basu, S.; Dhillon, I.; Mooney, R., Semi-supervised graph clusteringa kernel approach, Machine Learning, 74, 1-22, (2009)
[21] Z. Li, J. Liu, X. Tang, Constrained clustering via spectral regularization, in: CVPR, 2009, pp. 421-428.
[22] Yu, S. X.; Shi, J., Segmentation given partial grouping constraints, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 2, 173-180, (2004)
[23] T.D. Bie, J.A.K. Suykens, B.D. Moor, Learning from general label constraints, in: SSPR & SPR, 2004, pp. 671-679. · Zbl 1104.68602
[24] Z. Lu, M. Carreira-Perpinan, Constrained spectral clustering through affinity propagation, in: CVPR, 2008, pp. 1-8.
[25] X. Wang, I. Davidson, Flexible constrained spectral clustering, in: ACM SIGKDD, 2010, pp. 563-572.
[26] Lu, Z.; Peng, Y., Exhaustive and efficient constraint propagationa graph-based learning approach and its applications, International Journal of Computer Vision, 103, 3, 306-325, (2013) · Zbl 1270.68351
[27] X. Zhu, Semi-Supervised Learning Literature Survey (Technical Report 1530), Technical Report, Computer Science, University of Wisconsin-Madison, 2008.
[28] X. Zhu, Z. Ghahramani, J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in: ICML, 2003, pp. 912-919.
[29] Scheinerman, E. R.; Ullman, D. H., Fractional graph theory, (1997), Wiley-Interscience New York
[30] A. Azran, The rendezvous algorithm: multiclass semi-supervised learning with Markov random walks, in: ICML, 2007, pp. 49-56.
[31] D. Zhou, O. Bousquet, T. Lal, J. Weston, B. Schöelkopf, Learning with local and global consistency, in: NIPS, 2004, pp. 321-328.
[32] M. Meila, J. Shi, A random walks view of spectral segmentation, in: AISTATS, 2001, pp. 873-879.
[33] Shi, J.; Malik, J., Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888-905, (2000)
[34] A. Ng, M. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, in: NIPS, 2001, pp. 849-856.
[35] A. Strehl, J. Ghosh, R. Mooney, Impact of similarity measures on web-page clustering, in: AAAI, 2000, pp. 58-64.
[36] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in: ICCV, 2001, pp. 416-423.
[37] Zhou, H.; Zheng, J.; Wei, L., Texture aware image segmentation using graph cuts and active contours, Pattern Recognition, 46, 6, 1719-1733, (2013) · Zbl 1264.68209
[38] Peng, B.; Zhang, L.; Zhang, D., A survey of graph theoretical approaches to image segmentation, Pattern Recognition, 46, 3, 1020-1038, (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.