Principal components of sample estimates: an approach through symbolic data analysis. (English) Zbl 1405.62073

Summary: This paper deals with the analysis of datasets, where the subjects are described by the estimated means of a \(p\)-dimensional variable. Classical statistical methods of data analysis do not treat measurements affected by intrinsic variability, as in the case of estimates, so that the heterogeneity induced among subjects by this condition is not taken into account. In this paper a way to solve the problem is suggested in the context of symbolic data analysis, whose specific aim is to handle data tables where single valued measurements are substituted by complex data structures like frequency distributions, intervals, and sets of values. A principal component analysis is carried out according to this proposal, with a significant improvement in the treatment of information.


62H25 Factor analysis and principal components; correspondence analysis


Full Text: DOI


[1] Amato S, Palumbo F (2004) Multidimensional gap analysis. Stat Appl 16 (3)
[2] Barber CB, Dobkin DP, Huhdanpaa HT (1996) The quickhull algorithm for convex hulls. ACM Trans Math Softw 22, 469–483 · Zbl 0884.65145
[3] Bock HH, Diday E (2000) Analysis of symbolic data Exploratory methods for extracting statistical information from complex data. Springer, Berlin Heidelberg New York · Zbl 1039.62501
[4] Borzaga C (2000) Capitale umano e qualità del lavoro nei servizi sociali. FIVOL, Roma
[5] Borzaga C, Musella M (a cura di) (2003) Produttività ed efficienza nelle organizzazioni nonprofit. Edizioni31, Trento
[6] Bravo MC, García-Santesmases JM (1998) Symbolic object description of strata by segmentation trees. In: Nanapoulos Ph. et al (eds), NNTS’98, 85–90
[7] Carpita M (2003), Metodi per la costruzione di indicatori della qualità del lavoro: un’applicazione al settore dei servizi sociali. Stat Appl 1(2):3–34
[8] Cazes P, Chouakria A, Diday E, Schektman Y (1997) Extension de l’analyse en composantes principales à des données de type intervalle. Rev Stat Appl 45(3):5–24
[9] Chavent M (1998) A monothetic clustering algorithm. Pattern Recognit Lett 19: 989–996 · Zbl 0915.68148
[10] De Carvalho FAT (1994) Proximity coefficients between Boolean symbolic objects, In: Diday E, Lechevalier Y, Shader M et al (eds) IFCS-93, pp 387–394
[11] De Carvalho FAT (1996) Histogrammes et indices de proximité en analyse données symboliques. Acyes de l’école d’été sur l’analyse des données symboliques. LISE-CEREMADE, Université de Paris IX Dauphine, pp 101–127
[12] Depedri S (2003) Le determinanti della soddisfazione dei lavoratori, un’analisi per tipologie organizzative. In: Borzaga C, Musella M (a cura di) Produttività ed efficienza nelle organizzazioni nonprofit, Edizioni31, Trento
[13] Diday E (1987) Introduction l’approche symbolique en Analyse des Donnés, Première Journées Symbolique-Numérique. Université de Paris IX Dauphine
[14] D’Urso P, Giordani P (2005) A possibilistic approach to latent component analysis for symmetric fuzzy data. Fuzzy Sets Syst 150:285–305 · Zbl 1058.62050
[15] Gifi A (1990) Nonlinear multivariate analysis. In: Heiser W, Meulman J, van den Berg G, Wiley, Chichester
[16] Giordani P, Kiers HAL (2004a) Three-way component analysis of interval-valued data. J Chemom 18:253–264
[17] Giordani P, Kiers HAL (2004b) Principal component analysis of symmetric fuzzy data. Comput Stat Data Anal 45:519–548 · Zbl 1429.62338
[18] Godwa KC, Diday E(1991) Symbolic clustering using a new dissimilarity measure. Pattern Recognit 24(6):67–578
[19] Godwa KC, Diday E, Nagabhushan P(1995) Dimensionality reduction of symbolic data. Pattern Recognit Lett 16:219–223 · Zbl 05480444
[20] Green PJ (1981) Peeling bivariate data. In: Barnett V (ed) Interpreting multivariate data. Wiley, 3–19
[21] Green PJ, Silverman BW (1979) Constructiong the convex hull of a set of points in the plane. Comput J 22:262–266 · Zbl 0416.68060
[22] Ichino M, Yaguchi H (1994) Generalized Minkowski metrics for mixed feature type data analysis. IEEE Trans Syst, Man Cybern 24(4):698–708 · Zbl 1371.68235
[23] Lauro NC, Palumbo F (2000) Principal component analysis with interval data: a symbolic data analysis approach. Comput Stat 15(1): 73–87 · Zbl 0953.62058
[24] Lauro NC, Palumbo F (2004) Principal component analysis for non-precise data. In: Vichi M, Monari P, Mignani S, Montanari A (eds) New developments in classification and data analysis. Springer, Berlin Heidelberg New York, pp 173–184 · Zbl 1341.62163
[25] Lauro NC, Verde R, Palumbo F (2000) Factorial discriminant analysis of symbolic objects. In: Bock HH, Diday E (eds) Analysis of symbolic data Exploratory methods for extracting statistical information from complex data. Springer, Berlin Heidelberg New York, pp 212–233 · Zbl 0977.62070
[26] Manisera M (2005) Measuring job satisfaction by mean of nonlinear principal component analysis, PhD thesis
[27] Rodríguez O, Diday E, Winsberg S (2000) Generalization of the principal components analysis to histogram data. Proceedings of the PKDD2000, Lyon, France
[28] Zuccolotto P (2005) Principal component analysis of data tables with missing values: a proposal from symbolic data analysis (submitted Working Paper)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.