zbMATH — the first resource for mathematics

From cluster ensemble to structure ensemble. (English) Zbl 1248.68425
Summary: This paper investigates the problem of integrating multiple structures which are extracted from different sets of data points into a single unified structure. We first propose a new generalized concept called structure ensemble for the fusion of multiple structures. Unlike traditional cluster ensemble approaches the main objective of which is to align individual labels obtained from different clustering solutions, the structure ensemble approach focuses on how to unify the structures obtained from different data sources. Based on this framework, a new structure ensemble approach called the probabilistic bagging based structure ensemble approach (BSEA) is designed, which integrates the bagging technique, the force based self-organizing map (FBSOM) and the normalized cut algorithm into the proposed framework. BSEA views structures obtained from different datasets generated by the bagging technique as nodes in a graph, and adopts graph theory to find the most representative structure. In addition, the force based self-organizing map (FBSOM), which is a generalized form of SOM, is proposed to serve as the basic clustering algorithm in the structure ensemble framework. Finally, a new external index called correlation index (CI), which considers the correlation relationship of both the similarity and dissimilarity between the predicted solution and the true solution, is proposed to evaluate the performance of BSEA. The experiments show that (i) The performance of BSEA outperforms most of the state-of-the-art clustering approaches, and (ii) BSEA performs well on datasets from the UCI repository and real cancer gene expression profiles.

68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI
[1] Alizadeh, A.A.; Eisen, M.B.; Davis, R.E.; Ma, C.; Lossos, I.S.; Rosenwald, A.; Boldrick, J.G.; Sabet, H.; Tran, T.; Yu, X.; Powell, J.I.; Yang, L.M.; Marti, G.E.; Moore, T.; Hudson, J.; Lu, L.S.; Lewis, D.B.; Tibshirani, R.; Sherlock, G.; Chan, W.C.; Greiner, T.C.; Weisenburger, D.D.; Armitage, J.O.; Warnke, R.; Levy, R.; Wilson, W.; Grever, M.R.; Byrd, J.C.; Botstein, D.; Brown, P.O.; Staudt, L.M., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, 503-511, (2000)
[2] M.F. Amasyali, O. Ersoy, The performance factors of clustering ensembles, in: IEEE 16th Signal Processing, Communication and Applications Conference (SIU 2008), 2008, pp. 1-4.
[3] A. Asuncion, D.J. Newman, UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2007, <http://www.ics.uci.edu/mlearn/MLRepository.html>.
[4] H. Ayad, M. Kamel, Finding natural clusters using multiclusterer combiner based on shared nearest neighbors, in: Proceedings of the Fourth International Workshop Multiple Classifier Systems, 2003. · Zbl 1040.68586
[5] Ayad, H.G.; Kamel, M.S., Cumulative voting consensus method for partitions with variable number of clusters, IEEE transactions on pattern analysis and machine intelligence, 30, 1, 16-173, (2008)
[6] Bertoni, A.; Valentini, G., Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses, Artificial intelligence in medicine, 37, 2, 85-109, (2006)
[7] Bertoni, A.; Valentini, G., Discovering multi-level structures in bio-molecular data through the Bernstein inequality, BMC bioinformatics, 9, Suppl. 2:S4, 1-9, (2008)
[8] Breiman, L., Bagging predictors, Machine learning, 24, 2, 123-140, (1996) · Zbl 0858.68080
[9] Breiman, L., Random forests, Machine learning, 45, 1, 5-32, (2001) · Zbl 1007.68152
[10] Dietterich, T., Ensemble methods in machine learning, (), 1-15
[11] Dudoit, S.; Fridlyand, J., A prediction-based resampling method to estimate the number of clusters in a dataset, Genome biology, 3, 7, 0036-1-0036-21, (2002)
[12] Dudoit, S.; Fridlyand, J., Bagging to improve the accuracy of a clustering procedure, Bioinformatics, 19, 1090-1099, (2003)
[13] X.Z. Fern, C.E. Brodley, Random projection for high dimensional data clustering: a cluster ensemble approach, in: Proceedings of the 20th International Conference on Machine Learning, 2003, pp. 186-193.
[14] Fischer, B.; Buhmann, J.M., Bagging for path-based clustering, IEEE transactions on pattern analysis and machine intelligence, 25, 11, 1411-1415, (2003)
[15] Fowlkes, E.B.; Mallows, C.L., A method for comparing two hierarchical clusterings, Journal of the American statistical association, 78, 553-584, (1983) · Zbl 0545.62042
[16] Fred, A.L.N.; Jain, A.K., Combining multiple clusterings using evidence accumulation, IEEE transactions on pattern analysis and machine intelligence, 27, 6, 835-850, (2005)
[17] Freund, Y.; Schapire, R.E., A decision-theoretic generalization of on-line learning and an application to boosting, Journal of computer and system sciences, 55, 1, 119-139, (1997) · Zbl 0880.68103
[18] J. Gllavata, E. Qeli, B. Freisleben, Detecting text in videos using fuzzy clustering ensembles, in: Eighth IEEE International Symposium on Multimedia (ISM’06), 2006, pp. 283-290.
[19] D. Greene, A. Tsymbal, N. Bolshakova, P. Cunningham, Ensemble Clustering in Medical Diagnostics, Technical Report TCD-CS-2004-12, Dept. of Computer Science, Trinity College, Dublin, Ireland, 2004.
[20] Hadjitodorov, S.T.; Kuncheva, L.I.; Todorova, L.P., Moderate diversity for better cluster ensembles, Information fusion, 7, 3, 264-275, (2006)
[21] Hedenfalk, I.; Duggan, D.; Chen, Y.; Radmacher, M.; Bittner, M.; Simon, R.; Meltzer, P.; Gusterson, B.; Esteller, M.; Kallioniemi, O.P.; Wilfond, B.; Borg, A.; Trent, J.; Raffeld, M.; Yakhini, Z.; Ben-Dor, A.; Dougherty, E.; Kononen, J.; Bubendorf, L.; Fehrle, W.; Pittaluga, S.; Gruvberger, S.; Loman, N.; Johannsson, O.; Olsson, H.; Sauter, G., Gene-expression profiles in hereditary breast cancer, New england journal of medicine, 344, 8, 539-548, (2001)
[22] Ho, T.K., The random subspace method for constructing decision forests, IEEE transactions on pattern analysis and machine intelligence, 20, 8, 832-844, (1998)
[23] Hong, Y.; Kwong, S., Learning assignment order of instances for the constrained k-means clustering algorithm, IEEE transactions on systems, man, and cybernetics, part B: cybernetics, 39, 2, 568-574, (2009)
[24] X. Hu, I. Yoo, Cluster ensemble and its applications in gene expression analysis, in: Proceedings of the Second Asia-Pacific Bioinformatics Conference, 2004, pp. 297-302.
[25] Huang, X.; Zheng, X.; Yuan, W.; Wang, F.; Zhu, S., Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization, Information sciences, 181, 11, 2293-2302, (2011)
[26] Hu, X.; Park, E.K.; Zhang, X., Microarray gene cluster identification and annotation through cluster ensemble and EM based informative textual summarization, IEEE transactions on information technology in biomedicine, (2009)
[27] Jain, A.K.; Dubes, R.C., Algorithms for clustering data, (1988), Prentice-Hall Englewood Cliffs, NJ · Zbl 0665.62061
[28] Kaski, S.; Kangas, J.; Kohonen, T., Bibliography of self-organizing map (SOM) papers: 1981-1997, Neural computing surveys, 1, 3&4, 1-176, (1998)
[29] Kittler, J.; Hatef, M.; P Duin, R.; Matas, J., On combining classifiers, IEEE transactions on pattern analysis and machine intelligence, 20, 3, 226-239, (1998)
[30] Kohonen, T., Self-organizing maps, (1997), Springer-Verlag Heidelberg · Zbl 0866.68085
[31] Kuncheva, L.I., A theoretical study on six classifier fusion strategies, IEEE transactions on pattern analysis and machine intelligence, 24, 2, 281-286, (2002)
[32] Kuncheva, L.I.; Whitaker, C.J., Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine learning, 51, 2, 181-207, (2003) · Zbl 1027.68113
[33] Kuncheva, L.I., ’fuzzy’ vs. ’non-fuzzy’ in combining classifiers designed by boosting, IEEE transactions on fuzzy systems, 11, 6, 729-741, (2003)
[34] Kuncheva, L.I.; Hadjitodorov, S.T., Using diversity in cluster ensembles, Smc, 1214-1219, (2004)
[35] Kuncheva, L.I.; Vetrov, Dmitry, Evaluation of stability of k-means cluster ensembles with respect to random initialization, IEEE transactions on pattern analysis and machine intelligence, 28, 11, 1798-1808, (2006)
[36] Kuncheva, L.I.; Rodríguez, J.J., Classifier ensembles with a random linear oracle, IEEE transactions on knowledge and data engineering, 19, 4, 500-508, (2007)
[37] Lam, L., Classifier combinations: implementations and theoretical issues, (), 78-86
[38] Lange, T.; Buhmann, J.M., Combining partitions by probabilistic label aggregation, Sigkdd, 147-156, (2005)
[39] Lu, Z.; Ip, H.H.S., Generalized competitive learning of Gaussian mixture models, IEEE transactions on systems, man, and cybernetics, part B: cybernetics, 39, 4, 901-909, (2009)
[40] B. Minaei, A. Topchy, W. Punch, Ensembles of partitions via data resampling, in: Proceedings of International Conference on Information Technology: Coding and Computing, 2004, pp. 188-192.
[41] Monti, S.; Tamayo, P.; Mesirov, J.; Golub, T., Consensus clustering: a resamlping based method for class discovery and visualization of gene expression microarray data, Journal of machine learning, 52, 1-2, (2003)
[42] Ramaswamy, S.; Tamayo, P.; Rifkin, R.; Mukherjee, S.; Yeang, C.-H.; Angelo, M.; Ladd, C.; Reich, M.; Latulippe, E.; Mesirov, J.P.; Poggio, T.; Gerald, W.; Loda, M.; Lander, E.S.; Golub, T.R., Multi-class cancer diagnosis using tumor gene expression signatures, Proceedings of the national Academy of sciences, 98, 26, 15149-15154, (2001)
[43] Rand, W.M., Objective criteria for the evaluation of clustering methods, Journal of the American statistical association, 66, 846-850, (1971)
[44] Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J., Rotation forest: a new classifier ensemble method, IEEE transactions on pattern analysis and machine intelligence, 28, 10, 1619-1630, (2006)
[45] Shi, J.B.; Malik, J., Normalized cuts and image segmentation, IEEE transactions on pattern analysis and machine intelligence, 22, 8, 888-905, (2000)
[46] Smolkin, M.; Ghosh, D., Cluster stability scores for microarray data in cancer studies, BMC bioinformatics, 4, 36, (2003)
[47] Strehl, A.; Ghosh, J., Cluster ensembles – a knowledge reuse framework for combining multiple partitions, Journal of machine learning research, 3, 583-617, (2002) · Zbl 1084.68759
[48] Su, A.I.; Cooke, M.P.; Ching, K.A.; Hakak, Y.; Walker, J.R.; Wiltshire, T.; Orth, A.P.; Vega, R.G.; Sapinoso, L.M.; Moqrich, A.; Patapoutian, A.; Hampton, G.M.; Schultz, P.G.; Hogenesch, J.B., Large-scale analysis of the human and mouse transcriptomes, Proceedings of the national Academy of sciences, 99, 7, 4465-4470, (2002)
[49] A.P. Topchy, A.K. Jain, W. Punch, Combining multiple weak clusterings, in: Proceedings of IEEE International Conference on Data Mining, 2003, pp. 331-338.
[50] Topchy, A.P.; Law, M.H.C.; Jain, A.K.; Fred, A.L.N., Analysis of consensus partition in cluster ensemble, Icdm 2004, 225-232, (2004)
[51] A.P. Topchy, A.K. Jain, W. Punch, A mixture model for cluster ensembles, in: Proceedings of SIAM Conference on Data Mining, 2004, pp. 379-390.
[52] A.P. Topchy, B.M. Bidgoli, A.K. Jain, W.F. Punch, Adaptive clustering ensembles, in: ICPR 2004, 2004, pp. 272-275.
[53] Topchy, A.P.; Jain, A.K.; Punch, W.F., Cluster ensembles: models of consensus and weak partitions, IEEE transactions on pattern analysis and machine intelligence, 27, 12, 1866-1881, (2005)
[54] Ulas, A.; Semerci, M.; Yildiz, O.T.; Alpaydin, E., Incremental construction of classifier and discriminant ensembles, Information sciences, 179, 9, 1298-1318, (2009)
[55] Valentini, G., Clusterv: a tool for assessing the reliability of clusters discovered in DNA microarray data, Bioinformatics, 22, 3, 369-370, (2006)
[56] Valentini, G., Mosclust: a software library for discovering significant structures in bio-molecular data, Bioinformatics, 23, 3, 387-389, (2007)
[57] H. Wang, Z. Li, Y. Cheng, Distributed and parallelled EM algorithm for distributed cluster ensemble, Pacific-Asia Workshop on Computational Intelligence and Industrial Application (PACIIA’08), vol. 2, 2008, pp. 3-8.
[58] Xia, R.; Zong, C.; Li, S., Ensemble of feature sets and classification algorithms for sentiment classification, Information sciences, 181, 6, 1138-1152, (2011)
[59] Xiao, J.; He, C.; Jiang, X.; Liu, D., A dynamic classifier ensemble selection approach for noise data, Information sciences, 180, 18, 3402-3421, (2010)
[60] Xiong, H.; Wu, J.; Chen, J., K-means clustering versus validation measures: a data-distribution perspective, IEEE transactions on systems, man, and cybernetics, part B: cybernetics, 39, 2, 318-331, (2009)
[61] Yang, L.; Lv, H.; Wang, W., Soft cluster ensemble based on fuzzy similarity measure, IMACS multiconference on computational engineering in systems applications, 2, 1994-1997, (2006)
[62] Yin, H., Visom – a novel method for multivariate data projection and structure visualization, IEEE transactions on neural networks, 3, 1, 237-243, (2002)
[63] Yu, Z.; Wong, H.S.; Wang, H., Graph based consensus clustering for class discovery from gene expression data, Bioinformatics, 23, 21, 2888-2896, (2007)
[64] Z. Yu, Z. Deng, H.S. Wong, Identification of phosphorylation sites using a hybrid classifier ensemble approach, in: IEEE International Conference on Pattern Recognition 2008 (ICPR2008), Tampa, Florida, USA, 2008, pp. 1-4.
[65] Z. Yu, Z. Deng, H.S. Wong, K-nearest neighbor classifier ensemble for prediction of phosphorylation sites, in: The 2008 International Conference on Bioinformatics and Computational Biology (BIOCOMP’08), Las Vegas, Nevada, USA, 2008, pp. 713-717.
[66] Z. Yu, X. Wang, H.S. Wong. Ensemble based 3D human motion classification, in: IEEE International Joint Conference on Neural Networks (IJCNN2008), Hong Kong, China, 2008, pp. 506-510.
[67] Z. Yu, H.S. Wong, Image Classification based on the bagging-adaboost ensemble, in; IEEE International Conference on Multimedia & Expo 2008 (ICME2008), Hannover, Germany, 2008, pp. 1481-1484.
[68] Z. Yu, H.S. Wong, Knowledge based cluster ensemble for 3D head model classification, in: 19th International Conference on Pattern Recognition (ICPR 2008), 2008, pp. 1-4.
[69] Yu, Z.; Wong, H.S., Class discovery from gene expression data based on perturbation and cluster ensemble, IEEE transactions on nanobioscience, (2009)
[70] S. Zhai, B. Luo, Y. Guo, Fuzzy clustering ensemble based on dual boosting, in: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 2, 2007, pp. 240-244.
[71] Z. Zhang, H. Cheng, S. Zhang, W. Chen, Q. Fang, Clustering aggregation based on genetic algorithm for documents clustering, IEEE World Congress on Evolutionary Computation (CEC 2008), 2008, pp. 3156-3161.
[72] Zhang, X.; Jiao, L.; Liu, F.; Bo, L.; Gong, M., Spectral clustering ensemble applied to SAR image segmentation, IEEE transactions on geoscience and remote sensing, 46, 7, 2126-2136, (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.