×

zbMATH — the first resource for mathematics

Generalized adjusted rand indices for cluster ensembles. (English) Zbl 1234.68358
Summary: In this paper, adjusted rand index (ARI) is generalized to two new measures based on matrix comparison: (i) Adjusted Rand Index between a similarity matrix and a cluster partition (ARImp), to evaluate the consistency of a set of clustering solutions with their corresponding consensus matrix in a cluster ensemble, and (ii) adjusted rand index between similarity matrices (ARImm), to evaluate the consistency between two similarity matrices. Desirable properties of ARI are preserved in the two new measures, and new properties are discussed. These properties include: (i) detection of uncorrelatedness; (ii) computation of ARImp/ARImm in a distributed environment; and (iii) characterization of the degree of uncertainty of a consensus matrix. All of these properties are investigated from both the perspectives of theoretical analysis and experimental validation. We have also performed a number of experiments to show the usefulness and effectiveness of the two proposed measures in practical applications.

MSC:
68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence
Software:
sedaR
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Strehl, A.; Ghosh, J., Cluster ensembles—a knowledge reuse framework for combining multiple partitions, Journal of machine learning research, 3, 583-617, (2002) · Zbl 1084.68759
[2] Monti, S.; Tamayo, P.; Mesirov, J.; Golub, T., Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data, Machine learning, 52, 1, 91-118, (2003) · Zbl 1039.68103
[3] Fern, X.; Lin, W., Cluster ensemble selection, (), 787-797
[4] Fern, X.Z.; Brodley, C.E., Solving cluster ensemble problems by bipartite graph partitioning, ()
[5] Yu, Z.; Wong, H.; Wang, H., Graph-based consensus clustering for class discovery from gene expression data, Bioinformatics, 23, 21, 2888, (2007)
[6] Topchy, A.P.; Jain, A.K.; Punch, W.F., Clustering ensembles: models of consensus and weak partitions, IEEE transactions on pattern analysis and machine intelligence, 27, 12, 1866-1881, (2005)
[7] Fred, A.; Jain, A.K., Combining multiple clusterings using evidence accumulation, IEEE transactions on pattern analysis and machine intelligence, 27, 6, 835-850, (2005)
[8] Kuncheva, L.I.; Vetrov, D., Evaluation of stability of k-means cluster ensembles with respect to random initialization, IEEE transactions on pattern analysis and machine intelligence, 28, 11, 1798-1808, (2006)
[9] Ayad, H.; Kamel, M.S., Cumulative voting consensus method for partitions with variable number of clusters, IEEE transactions on pattern analysis and machine intelligence, 30, 1, 160-173, (2008)
[10] Azimi, J.; Fern, X., Adaptive cluster ensemble selection, (), 992-997
[11] Gionis, A.; Mannila, H.; Tsaparas, P., Clustering aggregation, ACM transactions on knowledge discovery from data (TKDD), 1, 1, (2007)
[12] Kuncheva, L.; Hadjitodorov, S.; Todorova, L., Experimental comparison of cluster ensemble methods, (), 1-7
[13] MacQueen, J.B., Some methods for classification and analysis of multivariate observations, (), 281-297
[14] Hubert, L.; Arabie, P., Comparing partitions, Journal of classification, 2, 193-218, (1985)
[15] He, X.; Ding, C.; Zha, H.; Simon, H., Automatic topic identification using webpage clustering, (), 195-202
[16] Neville, J.; Adler, M.; Jensen, D., Clustering relational data using attribute and link information, ()
[17] Carrington, P.; Scott, J.; Wasserman, S., Models and methods in social network analysis, (2005), Cambridge University Press
[18] Lord, P.W.; Stevens, R.D.; Brass, A.; Goble, C.A., Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation, Bioinformatics, 19, 10, 1275-1283, (2003)
[19] Schlicker, A.; Domingues, F.; Rahnenfuhrer, J.; Lengauer, T., A new measure for functional similarity of gene products based on gene ontology, BMC bioinformatics, 7, 1, 302, (2006)
[20] Yin, X.; Han, J.; Yu, P., Crossclus: user-guided multi-relational clustering, Data mining and knowledge discovery, 15, 3, 321-348, (2007) · Zbl 1132.68612
[21] Wang, F.; Ding, C.; Li, T., Integrated KL (K-means-Laplacian) clustering: a new clustering approach by combining attribute data and pairwise relations, (), 38-48
[22] Campello, R.J.G.B., A fuzzy extension of the rand index and other related indexes for clustering and classification assessment, Pattern recognition letters, 28, 7, 833-841, (2007)
[23] Fern, X.Z.; Brodley, C.E., Random projection for high dimensional data clustering: a cluster ensemble approach, (), 186-193
[24] Cristianini, N.; Kandola, J.; Elissee, A., On kernel target alignment, ()
[25] Lin, Y.; Liu, T.; Fuh, C.; Sinica, T., Local ensemble kernel learning for object category recognition, (), 1-8
[26] Mantel, N., A technique of disease clustering and a generalized regression approach, Cancer research, 27, 209-220, (1967)
[27] Schneider, J.W.; Borlund, P., Matrix comparison, part 2: measuring the resemblance between proximity measures or ordination results by use of the mantel and procrustes statistics, Journal of the American society for information science and technology, 58, 111, 1596-1609, (2007)
[28] Legendre, P.; Legendre, L., Numerical ecology, (1998), Elsevier Amsterdam · Zbl 1033.92036
[29] Vinh, N.; Epps, J.; Bailey, J., Information theoretic measures for clusterings comparison: is a correction for chance necessary?, () · Zbl 1242.62062
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.