×

zbMATH — the first resource for mathematics

Novel meta-heuristic algorithms for clustering web documents. (English) Zbl 1154.68492
Summary: Clustering the web documents is one of the most important approaches for mining and extracting knowledge from the web. Recently, one of the most attractive trends in clustering the high dimensional web pages has been tilt toward the learning and optimization approaches. In this paper, we propose novel hybrid harmony search based algorithms for clustering the web documents that finds a globally optimal partition of them into a specified number of clusters. By modeling clustering as an optimization problem, first, we propose a pure harmony search-based clustering algorithm that finds near global optimal clusters within a reasonable time. Then, we hybridize \(K\)-means and harmony clustering in two ways to achieve better clustering. Experimental results reveal that the proposed algorithms can find better clusters when compared to similar methods and also illustrate the robustness of the hybrid clustering algorithms.

MSC:
68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence
68M10 Network design and communication in computer systems
68W05 Nonnumerical algorithms
Software:
AntClust; C4.5; KAON
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Anderberg, M.R., Cluster analysis for applications, (1973), Academic Press, Inc. New York, NY · Zbl 0299.62029
[2] Bozsak, E.M.; Ehrig, S.; Handschuh, A.; Hotho, A.; Maedche, B.; Motik, D.; Oberle, C.; Schmitz, S.; Staab, L.; Stojanovic, N.; Stojanovic, R.; Studer, G.; Stumme, Y.; Sure, J.; Tane, R.; Volz, V.; Zacharias, Kaon – towards a large scale semantic web, (), 304-313 · Zbl 1020.68554
[3] Cios, K.; Pedrycs, W.; Swiniarski, R., Data mining-methods for knowledge discovery, (1998), Kluwer Academic Publishers
[4] Coello, C.A.C., Constraint-handling using an evolutionary multi objective optimization technique, Civil. eng. environ. syst., 17, 319-346, (2000)
[5] Cui, X.; Potok, T.E.; Palathingal, P., Document clustering using particle swarm optimization, IEEE swarm intell. symp., 185-191, (2005)
[6] Deb, K., An efficient constraint handling method for genetic algorithms, Comput. meth. appl. mech. eng., 186, 311-338, (2000) · Zbl 1028.90533
[7] Everitt, B., Cluster analysis, (1980), Halsted Press New York · Zbl 0507.62060
[8] Geem, Z.W.; Tseng, C.; Park, Y., Harmony search for generalized orienteering problem: best touring in China, Springer lect. notes comput. sci., 3412, 741-750, (2005)
[9] N. Grira, M. Crucianu, N. Boujemaa, Unsupervised and semi-supervised clustering: a brief survey, in: 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2005, pp. 9-16. · Zbl 1140.68461
[10] J. McQueen, Some methods for classification and analysis of multivariate observations, in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281-297.
[11] Jain, A.K.; Murty, M.N.; Flynn, P.J., Data clustering: a review, ACM comput. surv. (CSUR), 31, 3, 264-323, (1999)
[12] Gareth, Jones; Robertson, A.M.; Chawchat, Santimetvirul; Willett, P., Non-hierarchic document clustering using a genetic algorithm, Informat. res., 1, 1, (1995)
[13] Kennedy, J.; Eberhart, R.C.; Shi, Y., Swarm intelligence, (2001), Morgan Kaufman New York
[14] B. Larsen, C. Aone, Fast and effective text mining using linear-time document clustering, in: Proceedings of SIGKDD’99, CA, 1999, pp. 16-22.
[15] Labroche, N.; Monmarche’, N.; Venturini, G., Antclust: ant clustering and web usage mining, Genet. evolut. comput. conf., 25-36, (2003) · Zbl 1028.68819
[16] Lee, K.S.; Geem, Z.W., A new meta-heuristic algorithm for continues engineering optimization: harmony search theory and practice, Comput. meth. appl. mech. eng., 194, 3902-3933, (2004) · Zbl 1096.74042
[17] Mahdavi, M.; Fesanghary, M.; Damangir, E., An improved harmony search algorithm for solving optimization problems, Appl. math. comput., 188, 2, 1567-1579, (2007) · Zbl 1119.65053
[18] V.D. Merwe, A.P. Engelbrecht, Data clustering using particle swarm optimization, in: Proceedings of IEEE Congress on Evolutionary Computation 2003 (CEC 2003), Australia, 2003, pp. 215-220.
[19] Omran, M.G.H.; Mahdavi, M., Global-best harmony search, Appl. math. comput., 198, 643-656, (2008) · Zbl 1146.90091
[20] M. Omran, A. Salman, A.P. Engelbrecht, Image classification using particle swarm optimization, in: Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning 2002 (SEAL 2002), Singapore, 2002, pp. 370-374.
[21] Quinlan, R.J., C4.5: programs for machine learning, (1993), Morgan Kaufman
[22] V.V. Raghavan, K. Birchand, A clustering strategy based on a formalism of the reproductive process in a natural system, in: Proceedings of the Second International Conference on Information Storage and Retrieval, 1979, pp. 10-22.
[23] Salton, G., Automatic text processing, (1989), Addison-Wesley · Zbl 0251.68060
[24] Salton, G.; Buckley, C., Term-weighting approaches in automatic text retrieval, Informat. process. manage., 24, 5, 513-523, (1988)
[25] Selim, S.Z.; Ismail, M.A., K-means type algorithms: a generalized convergence theorem and characterization of local optimality, IEEE trans. pattern anal. Mach. intell., 6, 81-87, (1984) · Zbl 0546.62037
[26] M. Steinbach, G. Karypis, V. Kumar, A comparison of document clustering techniques, KDD’2000, Technical Report of University of Minnesota, 2000.
[27] Stumme, G.; Hotho, A.; Berendt, B., Semantic web mining state of the art and future directions journal of web semantics, Sci. serv. agents world wide web, 4, 2, 124-143, (2006)
[28] G. Stumme, A. Hotho, B. Berendt, Semantic web mining, Freiburg, September 3rd, in: 12th European Conference on Machine Learning (ECML’01)/5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’01), 2001. · Zbl 1048.68679
[29] Vasebi, A.; Fesanghary, M.; Bathaeea, S.M.T., Combined heat and power economic dispatch by harmony search algorithm, Int. J. elec. power, 29, 713-719, (2007)
[30] Zhao, Y.; Karypis, G., Empirical and theoretical comparisons of selected criterion functions for document clustering, Mach. learn., 55, 3, 311-331, (2004) · Zbl 1089.68615
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.