×

An adaptive genetic clustering method for exploratory mining of feature vector and time series data. (English) Zbl 1128.62373

Summary: This paper presents an adaptive genetic clustering method for exploratory mining of feature vector or time series. Our method basically implements the \(k\)-medoids algorithm with distance computed based on dynamic time warping for time series of unequal length and Euclidean for feature vectors. Each chromosome encodes objects serving as the \(k\)-medoids in binary. Both mutation and crossover rates are adapted during evolution. Six fitness functions are defined based on existing validity indices. An application in clustering grinding signals is first shown to illustrate its use. The method is then further evaluated using benchmark data available in the public domain to test the performance of the fitness functions employed. The results indicate that: (i) the proposed method produces comparable or better clustering accuracies than \(k\)-means if there is a priori knowledge about the number of clusters; and (ii) all six validity indices fail to correctly identify the actual number of clusters based on the \(k\)-means results for most data sets. A GA-based procedure is proposed to devise a new validity index with two tuning parameters and is shown not only identifying the actual number of clusters for all data sets but also producing better clustering accuracy than \(k\)-means and some indices.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62P99 Applications of statistics
90C59 Approximation methods and heuristics in mathematical programming
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] DOI: 10.1109/5326.923275
[2] DOI: 10.1016/S0031-3203(01)00108-X · Zbl 1001.68926
[3] Baragona R, Quaderni di Statistica 3 pp 1– (2001)
[4] Bezdek JC, Pattern Recognition with Fuzzy Objective Function Algorithms (1987)
[5] DOI: 10.1109/3477.678624
[6] DOI: 10.1016/S0377-2217(00)00320-9 · Zbl 1051.90527
[7] DOI: 10.1109/TPAMI.1979.4766909
[8] Estivill-Castro, V and Murray, AT. 1998. Spatial clustering for data mining with genetic algorithms. Proceedings of the International ICSC Symposium on Engineering of Intelligent Systems. February 11–131998, Tenerife, Spain. EIS-98
[9] Fu, T-C, Chung, F-L, Ng, V and Luk, R. 2001. Pattern discovery from stock time series using self-organizing maps. KDD 2001 Workshop on Temporal Data Mining. August 26–292001, San Francisco, CA. pp.27–37.
[10] DOI: 10.1002/mrm.1910400211
[11] Goutte C, Neuroimage 9 pp 298– (1999)
[12] Kalpakis, K, Gada, D and Puttagunta, V. 2001. Distance measures for effective clustering of ARIMA time-series. Proceedings of 2001 IEEE International Conference on Data Mining. November 29–December 22001, San Jose, CA. pp.273–280.
[13] Kaufman L, Finding Groups in Data: An Introduction to Cluster Analysis (1990) · Zbl 1345.62009
[14] Kim DJ, IEICE Transactions on Information Systems 84 pp 281– (2001)
[15] DOI: 10.1109/3477.764879
[16] DOI: 10.1109/91.940971
[17] Li C, LNCS 164 pp 245– (1999)
[18] DOI: 10.1016/j.patcog.2005.01.025 · Zbl 1077.68803
[19] Lorena LAN, Evolutionary Computation 9 pp 309– (2001) · Zbl 05412747
[20] MacQueen J, 5th Berkley Symposium on Mathematical Statistics and Probability 1 pp 281– (1967)
[21] DOI: 10.1016/S0031-3203(99)00137-5 · Zbl 02181322
[22] DOI: 10.1109/TPAMI.2002.1114856 · Zbl 05112847
[23] DOI: 10.1007/BF02294245
[24] Mitchell M, An Introduction to Genetic Algorithms (1996)
[25] DOI: 10.1016/j.patcog.2003.06.005 · Zbl 1068.68135
[26] Pham DT, in Proceedings of the Institute of Mechanical Engineers 212 pp 115– (1998)
[27] Policker S, IEEE Transactions on Systems, Man, and Cybernetics–Part B: Cybernetics 30 pp 339– (2000)
[28] DOI: 10.1109/21.286385
[29] DOI: 10.1016/S0031-3203(99)00105-3 · Zbl 05466530
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.