×

Standardization of interval symbolic data based on the empirical descriptive statistics. (English) Zbl 1239.62003

Summary: In many statistical analysis methods, standardization of the sample data is usually recommended to prevent the results from being strongly affected by the scale of measurements of the variables. This paper focuses on the standardization of interval data obtained by symbolic data analysis (SDA). SDA is a new data analysis technique which captures the value of a variable with a symbolic representation. The empirical descriptive statistics of the interval symbolic variable are studied first. We then proposed the standardization method of interval symbolic data and conducted a simulation study to evaluate our standardization method by using cluster analysis. An application example on e-shops of several major cities in China is given at the end of the paper. Differing from previous research, we do not require the assumption of uniformly distributed data in the interval. Our method makes the best use of the original sample information.

MSC:

62-07 Data analysis (statistics) (MSC2010)
65C60 Computational problems in statistics (MSC2010)
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

SODAS
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bertrand, P.; Goupil, F., Descriptive statistics for symbolic data, (Bock, H. H.; Diday, E., Analysis of Symbolic Data (2000), Springer-Verlag: Springer-Verlag Berlin, New York), 106-124 · Zbl 0978.62005
[2] Billard, L.; Diday, E., From the statistics of data to the statistics of knowledge: symbolic data analysis, Journal of the American Statistical Association, 98, 470-487 (2003)
[3] Billard, L.; Diday, E., Symbolic Data Analysis: Conceptual Statistics and Data Mining (2006), John Wiley & Sons Ltd.: John Wiley & Sons Ltd. England, pp. 242-243 · Zbl 1117.62002
[4] (Bock, H. H.; Diday, E., Analysis of Symbolic Data (2000), Springer-Verlag: Springer-Verlag Berlin, New York) · Zbl 1039.62501
[5] De Carvalho, F. D.A. T.; Brito, P.; Bock, H. H., Dynamic clustering for interval data based on L2 distance, Computational Statistics, 21, 231-250 (2006) · Zbl 1114.62070
[6] De Carvalho, F. D.A. T.; De Souza, R. M.C. R.; Chavent, M.; Lechevallier, Y., Adaptive Hausdorff distances and dynamic clustering of symbolic interval data, Pattern Recognition Letters, 27, 167-179 (2006)
[7] De Carvalho, F. D.A. T.; Lechevallier, Y., Partitional clustering algorithms for symbolic interval data based on single adaptive distances, Pattern Recognition, 42, 1223-1236 (2009) · Zbl 1183.68527
[8] (Diday, E.; Noirhomme-Fraiture, M., Symbolic Data Analysis and the SODAS Software (2008), John Wiley & Sons Ltd.: John Wiley & Sons Ltd. Chichester, England) · Zbl 1275.62029
[9] Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 193-218 (1985)
[10] Irpino, A., Spaghetti PCA analysis: an extension of principal components analysis to time dependent interval data, Pattern Recognition Letters, 27, 504-513 (2006)
[11] Irpino, A.; Verde, R., Dynamic clustering of interval data using a Wasserstein-based distance, Pattern Recognition Letters, 29, 1648-1658 (2008)
[12] Izenman, A. J., Modern Multivariate Statistical Techniques (2008), Springer: Springer New York, 407-413
[13] Palumbo, F., Irpino, A., 2005. Multidimensional interval-data: metrics and factorial analysis. In: Proceedings of the ASMDA 2005. Brest. pp. 689-698.; Palumbo, F., Irpino, A., 2005. Multidimensional interval-data: metrics and factorial analysis. In: Proceedings of the ASMDA 2005. Brest. pp. 689-698.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.