Rough set-based clustering with refinement using Shannon’s entropy theory.

*(English)*Zbl 1134.94337Summary: Lots of clustering algorithms have been developed, while most of them cannot process objects in hybrid numerical/nominal attribute space or with missing values. In most of them, the number of clusters should be manually determined and the clustering results are sensitive to the input order of the objects to be clustered. These limit applicability of the clustering and reduce the quality of clustering. To solve this problem, an improved clustering algorithm based on rough set (RS) and entropy theory was presented. It aims at avoiding the need to prespecify the number of clusters, and clustering in both numerical and nominal attribute space with the similarity introduced to replace the distance index. At the same time, the RS theory endows the algorithm with the function to deal with vagueness and uncertainty in data analysis. Shannon’s entropy was used to refine the clustering results by assigning relative weights to the set of attributes according to the mutual entropy values. A novel measure of clustering quality was also presented to evaluate the clusters. This algorithm was analyzed and applied later to cluster the data set of one industrial product. The experimental results confirm that performances of efficiency and clustering quality of this algorithm are improved.

##### MSC:

94A17 | Measures of information, entropy |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

PDF
BibTeX
XML
Cite

\textit{C.-B. Chen} and \textit{L.-Y. Wang}, Comput. Math. Appl. 52, No. 10--11, 1563--1576 (2006; Zbl 1134.94337)

Full Text:
DOI

##### References:

[1] | Dunham, M.H., Data mining introductory and advanced topics, (2003), Tsinghua University Press Beijing |

[2] | Bezdek, J.C., Pattern recognition with fuzzy objective function algorithms, (1981), Plenum Press New York · Zbl 0503.68069 |

[3] | Höppner, F.; Klawonn, F.; Kruse, R.; Runkler, T., Fuzzy cluster analysis, (1999), Wiley Chichester |

[4] | Pawlak, Z., Rough sets theory and its applications to data analysis, Cybernetics & systems, 29, 641-688, (1998) · Zbl 1008.03526 |

[5] | An, A.; Chan, C.; Shan, N.; Cercone, N.; Ziarko, W., Applying knowledge discovery to predict water-supply consumption, IEEE expert, 12, 4, (1997), 72-74, 76-78 |

[6] | Wu, W.-Z.; Zhang, W.-X.; Li, H.-Z., Knowledge acquisition in incomplete fuzzy information systems via the rough set approach, Expert systems, 20, 5, 280-286, (2003) |

[7] | Pawlak, Z., Rough sets, International journal of computer and information science, 11, 341-356, (1982) · Zbl 0501.68053 |

[8] | Zhang, W.-X.; Wu, W.-Z.; Liang, J.-Y.; Li, D.-Y., Rough set theory and approaches, (2001), Science Press Beijing |

[9] | Y.Y. Yao, X. Li, T.Y. Lin and Q. Liu, Representation and classification of rough set models, In Proceedings of Third International Workshop on Rough Sets and Soft Computing, pp. 630-637, (1994). |

[10] | Lingras, P.; Yan, R., Interval clustering using fuzzy and rough set theory, (), 780-784 |

[11] | Stepaniuk, J., Similarity based rough sets and learning, (), 18-22 |

[12] | C.E. Shannon, A mathematical theory of communication, Reprinted with corrections, The Bell System Technical Journal\bf27, 379-423, 623-656, (1948). |

[13] | Klir, G.J.; Yuan, B., Fuzzy sets and fuzzy logic: theory and applications, (2000), Beijing Normal University Press Beijing |

[14] | Wang, G.Y., Rough set theory and data mining, (2001), Xi’an Jiaotong University Press Xi’an |

[15] | Zhu, W.; Zhang, W.; Fu, Y., An incomplete data analysis approach using rough set theory, (), 332-338 |

[16] | Wang, Q.; Dai, H.; Sun, Y., A rough set based clustering algorithm and the information theoretical approach to refine clusters, (), 4287-4291 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.