×

A new representation of interval symbolic data and its application in dynamic clustering. (English) Zbl 1347.62114

Summary: In this study, we consider the type of interval data summarizing the original samples (individuals) with classical point data. This type of interval data are termed interval symbolic data in a new research domain called, symbolic data analysis. Most of the existing research, such as the (centre, radius) and [lower boundary, upper boundary] representations, represent an interval using only the boundaries of the interval. However, these representations hold true only under the assumption that the individuals contained in the interval follow a uniform distribution. In practice, such representations may result in not only inconsistency with the facts, since the individuals are usually not uniformly distributed in many application aspects, but also information loss for not considering the point data within the intervals during the calculation. In this study, we propose a new representation of the interval symbolic data considering the point data contained in the intervals. Then we apply the city-block distance metric to the new representation and propose a dynamic clustering approach for interval symbolic data. A simulation experiment is conducted to evaluate the performance of our method. The results show that, when the individuals contained in the interval do not follow a uniform distribution, the proposed method significantly outperforms the Hausdorff and city-block distance based on traditional representation in the context of dynamic clustering. Finally, we give an application example on the automobile data set.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

SODAS
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] BERTOLUZZA, C., CORRAL, N., and SALAS, A. (2008), “On a New Class of Distances between Fuzzy Numbers,” Mathware & Soft Computing, 2(2), 71-84. · Zbl 0887.04003
[2] BILLARD, L., and DIDAY, E. (2006), Symbolic Data Analysis: Conceptual Statistics and Data Mining, UK: John Wiley & Sons Ltd. · Zbl 1117.62002
[3] BLANCO-FERN NDEZ, A., CORRAL, N., and GONZ LEZ-RODR GUEZ, G. (2011), “Estimation of a Flexible Simple Linear Model for Interval Data Based on Set Arithmetic,” Computational Statistics & Data Analysis, 55(9), 2568-2578. · Zbl 1464.62030
[4] BRITO, P. (2014), “Symbolic Data Analysis: Another Look at the Interaction of Data Mining and Statistics,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(4), 281-295.
[5] CHA, S.-H., and SRIHARI, S.N. (2002), “On Measuring the Distance between Histograms,” Pattern Recognition, 35(6), 1355-1370. · Zbl 0997.68123
[6] CHAVENT, M., and LECHEVALLIER, Y. (2002), “Dynamical Clustering of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance<Emphasis Type=”Italic“>”, in Classification, Clustering, and Data Analysis, eds. K. Jajuga, A. Sokolowski, and H-H. Bock, Springer, pp. 53-60. · Zbl 1032.62058
[7] CHEN, Q., LI, G., and PHOEBE CHEN, Y.-P. (2011), “Interval-Based Distance Function for Identifying Rna Structure Candidates,” Journal of Theoretical Biology, 269(1), 280-286. · Zbl 1307.92306
[8] DE CARVALHO, F.D.A., DE SOUZA, R.M., CHAVENT, M., and LECHEVALLIER, Y. (2006), “Adaptive Hausdorff Distances and Dynamic Clustering of Symbolic Interval Data,” Pattern Recognition Letters, 27(3), 167-179.
[9] DE SOUZA, R.M., and DE CARVALHO, F.D.A. (2004), “Clustering of Interval Data Based on City-Block Distances,” Pattern Recognition Letters, 25(3), 353-365.
[10] DIDAY, E. (1989), “Introduction a L”Analyse Des Donnees Symboliques,” RR-1074, <inria-00075484>. · Zbl 0673.62003
[11] DIDAY, E. (1995), “Probabilist, Possibilist and Belief Objects for Knowledge Analysis,” Annals of Operations Research, 55(2), 225-276. · Zbl 0844.68024
[12] DIDAY, E., and NOIRHOMME-FRAITURE, M. (2008), Symbolic Data Analysis and the Sodas Software, Wiley Online Library. · Zbl 1275.62029
[13] GUO, J., LI, W., LI, C., and GAO, S. (2012), “Standardization of Interval Symbolic Data Based on the Empirical Descriptive Statistics,” Computational Statistics & Data Analysis, 56(3), 602-610. · Zbl 1239.62003
[14] HEDJAZI, L., AGUILAR-MARTIN, J., and LE LANN, M.-V. (2011), “Similarity-Margin Based Feature Selection for Symbolic Interval Data,” Pattern Recognition Letters, 32(4), 578-585.
[15] HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions,” Journal of Classification 2(1), 193-218. · Zbl 0587.62128
[16] IRPINO, A., and VERDE, R. (2008), “Dynamic Clustering of Interval Data Using a Wasserstein-Based Distance,” Pattern Recognition Letters, 29(11), 1648-1658. · Zbl 1147.62054
[17] MALI, K., and MITRA, S. (2002), “Clustering of Symbolic Data and Its Validation,” in Advances in Soft Computing-Afss 2002, eds. N.R. Pal and M. Sugeno, Berlin Heidelberg: Springer, pp. 339-344. · Zbl 1053.68626
[18] MALI, K., and MITRA, S. (2003), “Clustering and Its Validation in a Symbolic Framework,” Pattern Recognition Letters, 24(14), 2367-2376. · Zbl 1047.68132
[19] SINOVA, B., COLUBI, A., and GIL, M. (2012), “Interval Arithmetic-Based Simple Linear Regression between Interval Data: Discussion and Sensitivity Analysis on the Choice of the Metric,” Information Sciences, 199, 109-124. · Zbl 06094584
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.