×

zbMATH — the first resource for mathematics

Far beyond the classical data models: symbolic data analysis. (English) Zbl 07260275
Summary: This paper introduces symbolic data analysis, explaining how it extends the classical data models to take into account more complete and complex information. Several examples motivate the approach, before the modeling of variables assuming new types of realizations are formally presented. Some methods for the (multivariate) analysis of symbolic data are presented and discussed. This is however far from being exhaustive, given the present dynamic development of this new field of research.
MSC:
62 Statistics
68 Computer science
Software:
SODAS
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] E. Diday, The symbolic approach in clustering and related methods of data analysis: the basic choices, In Classification and Related Methods of Data Analysis, Proceedings of IFCS’87, H.-H. Bock, ed., Aachen, July 1987, North Holland, Amsterdam, 1988, 673-684.
[2] E. Diday, Introduction ‘a l’approche symbolique en analyse des donn´ees, RAIRO, Recherche Op´erationnelle 23(2), (1989), 193-236. · Zbl 0673.62003
[3] H.-H. Bock and E. Diday, eds., Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data, Berlin-Heidelberg, Springer-Verlag, 2000. · Zbl 1039.62501
[4] E. Diday and M. Noirhomme-Fraiture, eds., Symbolic Data Analysis and the Sodas Software, Chichester, Wiley, 2008. · Zbl 1275.62029
[5] L. Billard and E. Diday, Symbolic Data Analysis: Conceptual Statistics and Data Mining, Chichester, Wiley, 2006. · Zbl 1117.62002
[6] L. Billard, Brief overview of symbolic data and analytic issues, Stat Anal Data Min, this issue, (2011).
[7] E. Diday and M. Vrac, Mixture decomposition of distributions by copulas in the symbolic data analysis framework, Discrete Appl Math 147(1) (2005), 27-41. · Zbl 1058.62004
[8] E. Cuvelier, QAMML: Probability Distributions for Functional Data. Ph.D. Thesis, University of Namur, Belgium, 2009.
[9] M. Noirhomme-Fraiture, Asymptotic behaviour in symbolic Markov chain, In Classification as a Tool for Research, In Proceedings of the 11th IFCS Conference, Dresden, H. Locarek-Junge, C. Weihs, eds., Heidelberg, Springer, 2010.
[10] G. Choquet, Theory of capacities, Anna Inst Fourier, 5 (1954), 131-295. · Zbl 0064.35101
[11] D. Dubois and H. Prade, Properties of measures of information in evidence and possibility theories, Fuzzy Sets and Systems 100 Supplement, 1999, 35-49.
[12] P. Walley, Towards a unified theory of imprecise probability, International Journal of Approximate Reasoning 24(2-3) (2000), 125-148. · Zbl 1007.28015
[13] R. Vignes, Caract´erisation Automatique de Groupes Biologiques, Ph.D. Thesis, University Paris VI, 1991.
[14] F. A. T. De Carvalho, Proximity coefficients between boolean symbolic objects, In New Approaches in Classification and Data Analysis, E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, and B. Burtschy, eds., BerlinHeidelberg, Springer-Verlag, 1994, 387-394.
[15] M. Csernel and F. A. T. De Carvalho, Usual operations with symbolic data under Normal Symbolic Form, Applied Stochastic Models in Business and Industry 15 (1999), 241-257. · Zbl 0960.62004
[16] F. A. T. De Carvalho, P. Brito, and H.-H. Bock, Dynamic clustering for interval data based on L2distance, Comput Stat 21(2) (2006), 231-250. · Zbl 1114.62070
[17] A. P. Duarte Silva and P. Brito, Linear discriminant analysis for interval data, Comput Stat 21(2) (2006), 289-308. · Zbl 1113.62080
[18] P. Bertrand and F. Goupil, Descriptive statistics for symbolic data, In Analysis of Symbolic Data, Exploratory Methods · Zbl 0978.62005
[19] L. Billard and E. Diday, From the statistics of data to the statistics of knowledge: symbolic data analysis, J Am Stat Assoc 98(462) (2003), 470-487.
[20] L. Billard, Dependencies and variation components of symbolic interval-valued data, In Selected Contributions in Data Analysis and Classification, P. Brito, P. Bertrand, C. Cucumel, and F. De Carvalho, eds., Heidelberg, Springer, 2007, 3-12. · Zbl 05486137
[21] L. Billard, Sample covariance functions for complex quantitative data, In Proceedings of IASC2008, Joint Meeting of 4th World Conference of the IASC and 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, Yokohama, Japan, 2008.
[22] L. Billard, Dependencies in bivariate interval-valued symbolic data, In Classification, Clustering and Data Mining Applications, D. Banks, L. House, F. R. McMorris, P. Arabie, W. Gaul, eds., Proceedings of the Meeting of the International Federation of Classification Societies (IFCS 2004), Berlin-Heidelberg, Springer, 2004, 319-324.
[23] L. Billard and E. Diday, Descriptive statistics for intervalvalued observations in the presence of rules, Comput Stat 21(2) (2006), 187-210. · Zbl 1114.62003
[24] A. Chouakria, P. Cazes, and E. Diday, Symbolic principal component analysis, In Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data, H.-H. Bock and E. Diday, eds., Heidelberg, Springer, 2000, 200-212. · Zbl 0977.62063
[25] P. Cazes, A. Chouakria, E. Diday, and Y. Schektman, Extensions de l’analyse en composantes principales ‘a des donn´ees de type intervalle, Rev Stat Appl 24 (1997), 5-24.
[26] C. Lauro and F. Palumbo, Principal component analysis for non-precise data, In New Developments in Classification and Data Analysis, M. Vichi, P. Monari, S. Mignani, and A. Montanari, eds., Berlin-Heidelberg, Springer, 2005, 173-184. · Zbl 1341.62163
[27] P. Giordani and H. A. L. Kiers, A comparison of three methods for principal component analysis of fuzzy interval data, Comput Stat & Data Anal, special issue The Fuzzy Approach to Statistical Analysis 51(1) (2006), 379-397. · Zbl 1157.62426
[28] O. Rodriguez, E. Diday, and S. Winsberg, Generalization of the principal components analysis to histogram data, In Proceedings 4th European Conference on Principles and Practice of Knowledge Discovery in Data Bases; Workshop on Symbolic Data Analysis, Lyon, 14, 2000.
[29] O. Rodriguez and A. Pacheco, Applications of histogram principal components analysis, In The 15th European Conference on Machine Learning (ECML) and the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Pisa, 2004.
[30] M. Ichino, Symbolic PCA for histogram-valued data, In Proceedings of IASC2008, Joint Meeting of 4th World Conference of the IASC and 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, Yokohama, Japan, 2008.
[31] C. Lauro, R. Verde, and A. Irpino, Generalized canonical analysis, In Symbolic Data Analysis and the Sodas Software, E. Diday and M. Noirhomme-Fraiture, eds., Chichester, Wiley, 2008, 313-330.
[32] P. Brito, On the analysis of symbolic data, In: Selected Contributions in Data Analysis and Classification, P. Brito, · Zbl 05486138
[33] M. Chavent, Normalized k-means clustering of hyperrectangles, In Proceedings of the XIth International Symposium of Applied Stochastic Models and Data Analysis (ASMDA 2005), Brest, France, 2005, 670-677.
[34] F. Esposito, D. Malerba, and A. Appice, Dissimilarity and matching, In Symbolic Data Analysis and the Sodas Software, E. Diday and M. Noirhomme-Fraiture, eds., Chichester, Wiley, 2008, 123-148.
[35] E. Diday and F. Esposito, An introduction to symbolic data analysis and the SODAS software, Intelligent Data Analysis 7 (2003), 583-602.
[36] R. M. C. R. de Souza and F. A. T. De Carvalho, Clustering of interval data based on City-Block distances, Pattern Recogn Lett 25(3) (2004), 353-365.
[37] R. M. C. R. de Souza, F. A. T. De Carvalho, and C. P. Tenorio, Two partitional methods for interval-valued data using Mahalanobis distances, IBERAMIA, 2004, 454-463.
[38] M. Chavent, F. A. T. De Carvalho, Y. Lechevallier, and R. Verde, New clustering methods for interval data, Comput Stat 21(2) (2006), 211-229. · Zbl 1114.62069
[39] F. A. T. De Carvalho and R. M. C. R. de Souza, Unsupervised pattern recognition models for mixed featuretype symbolic data, Pattern Recogn Lett 31(5) (2010), 430-443.
[40] F. A. T. De Carvalho, M. Csernel, and Y. Lechevallier, Clustering constrained symbolic data, Pattern Recogn Lett 30(11) (2009), 1037-1045.
[41] F. A. T. De Carvalho, Fuzzy c-means clustering methods for symbolic interval data, Pattern Recogn Lett 28 (2007), 423-437.
[42] R. M. C. R. de Souza, F. A. T. De Carvalho, and F. C. D. Silva, Clustering of interval-valued data using adaptive squared Euclidean distances, In Proceedings ICONIP, 2004, 775-780.
[43] R. M. C. R. de Souza and F. A. T. De Carvalho, Dynamic clustering of interval data based on adaptive Chebyshev distances, Electron Lett 40(11) (2004), 658-659.
[44] F. A. T. De Carvalho, R. M. C. R. de Souza, M. Chavent, and Y. Lechevallier, Adaptive Hausdorff distances and dynamic clustering of symbolic interval data, Pattern Recogn Lett 27(3) (2006), 167-179.
[45] F. A. T. De Carvalho and Y. Lechevallier, Partitional clustering algorithms for symbolic interval data based on single adaptive distances, Pattern Recogn 42(7) (2009), 1223-1236. · Zbl 1183.68527
[46] F. A. T. De Carvalho and C. P. Tenorio, Fuzzy kmeans clustering algorithms for interval-valued data based on adaptive quadratic distances, Fuzzy Sets and Systems 161(23) (2010), 2978-2999. · Zbl 1204.62106
[47] A. Hardy and N. Kasaro, A new clustering method for interval data, Math´ematiques et Sciences Humaines 187 (2009), 79-91.
[48] A. Hardy and J. Baune, Clustering and validation of interval data, In Selected Contributions in Data Analysis and Classification, P. Brito, P. Bertrand, C. Cucumel, and F. De Carvalho, eds., Heidelberg, Springer, 2007, 69-82. · Zbl 1181.68236
[49] P. Brito, Analyse de Donn´ees Symboliques. Pyramides d’H´eritage. Ph.D. Thesis, University Paris-IX Dauphine, 1991.
[50] P. Brito, Use of pyramids in Symbolic Data Analysis, In New Approaches in Classification and Data Analysis, E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, and
[51] P. Brito, Symbolic objects: order structure and pyramidal clustering, Ann Oper Res 55 (1995), 277-297. · Zbl 0844.68025
[52] P. Brito, Symbolic clustering of probabilistic data, In Advances in Data Science and Classification, A. Rizzi, M. Vichi, and H.-H. Bock, eds., Berlin-Heidelberg, SpringerVerlag, 1998, 385-390. · Zbl 1051.91525
[53] P. Brito and F. A. T. De Carvalho, Symbolic clustering in the presence of hierarchical rules, In Studies and Research Proceedings of the Conference on Knowledge Extraction and Symbolic Data Analysis (KESDA’98), Luxembourg, Office for Official Publications of the European Communities, 1999, 119-128.
[54] P. Brito and F. A. T. De Carvalho, Symbolic clustering of constrained probabilistic data, In Exploratory Data Analysis in Empirical Research, O. Opitz, M. Schwaiger, eds., Heidelberg, Springer Verlag, 2002, 12-21.
[55] P. Brito and F.A.T. De Carvalho, Hierarchical and pyramidal clustering, In Symbolic Data Analysis and the Sodas Software, E. Diday and M. Noirhomme-Fraiture, eds., Chichester, Wiley, 2008, 181-203.
[56] M. Chavent, A monothetic clustering method, Pattern Recognition Letters 19(11) (1998), 989-996. · Zbl 0915.68148
[57] H.-H. Bock, Visualizing symbolic data by Kohonen maps, In Symbolic Data Analysis and the Sodas Software, E. Diday and M. Noirhomme-Fraiture, eds., Chichester, Wiley, 2008, 205-234.
[58] A. Irpino and R. Verde, A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data, Data Science and Classification, Proceedings of the Conference of the International Federation of Classification Societies (IFCS06), Berlin, Springer, 2006, 185-192.
[59] P. Brito and M. Ichino, Symbolic clustering based on quantile representation, presented at COMPSTAT’2110, Paris, 2010.
[60] N. C. Lauro, R. Verde, and F. Palumbo, Factorial discriminant analysis on symbolic objects, In Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data, H.-H. Bock and E. Diday, eds., Heidelberg, Springer, 2000, 212-233. · Zbl 0977.62070
[61] N. C. Lauro, R. Verde, and A. Irpino, Factorial discriminant analysis, In Symbolic Data Analysis and the Sodas Software, E. Diday and M. Noirhomme-Fraiture, eds., Chichester, Wiley, 2008, 341-358.
[62] J. P. Rasson and S. Lissoir, Symbolic kernel discriminant analysis, In Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data, H. -H. Bock and E. Diday, eds., Heidelberg, Springer, 2000, 240-244. · Zbl 0977.62072
[63] J. P. Rasson, J.-Y. Pirc¸on, P. Lallemand, and S. Adans, Unsupervised divisive classification, In Symbolic Data Analysis and the Sodas Software, E. Diday and M. NoirhommeFraiture, eds., Chichester, Wiley, 2008, 149-156.
[64] E. P´erinel and Y. Lechevallier, Symbolic discrimination rules, In Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data, H.-H. Bock and E. Diday, eds., Heidelberg, Springer, 2000, 244-265. · Zbl 0976.62061
[65] A. Ciampi, E. Diday, J. Lebbe, E. P´erinel, and R. Vignes, Growing a tree classifier with imprecise data, Pattern Recogn Lett 21(9) (2000), 787-803. · Zbl 0902.62006
[66] M. C. Bravo Llatas and J. M. Santesmases, Segmentation Trees for Stratified Data, In Analysis of Symbolic Data, · Zbl 0977.62067
[67] T.-N Do and F. Poulet, Kernel methods and visualization for interval data mining, In Proceedings of the Conference on Applied Stochastic Models and Data Analysis, ASMDA 2005. J. Janssen and P. Lenca, eds., ENST Bretagne, 2005.
[68] E. Carrizosa, J. Gordillo, and F. Plastria, Classification problems with imprecise data through separating hyperplanes’ [Online]. Available at http://www.optimizationonline.org/DB_FILE/2007/09/1781.pdf, 2007.
[69] J. S´ıma, Neural expert systems, Neural Netw 8(2) (1995), 261-271.
[70] S. J. Simoff, Handling uncertainty in neural networks: an interval approach’, In Proceedings of the IEEE International Conference on Neural Networks, IEEE, Washington DC, 1996, 606-610.
[71] M. Beheshti, A. Berrached, A. de Korvin, C. Hu, and O. Sirisaengtaksin, On interval weighted freelayer neural networks, In Proceedings of the 31st Annual Simulation Symposium, IEEE Computer Society Press, 1998, 188-194.
[72] F. Rossi and B. Conan Guez, Multilayer perceptron on interval data, In Classification, Clustering and Data Analysis, K. Jajuga, A. Sokolowski, and H.-H. Bock, eds., Berlin, Heidelberg, New York, Springer, 2002, 427-434.
[73] L. Billard and E. Diday, Regression analysis for intervalvalued data, in ‘Data Analysis, Classification, and Related Methods, In Proceedings of the Seventh Conference of the International Federation of Classification Societies (IFCS00), Springer, 2000, 369-374. · Zbl 1026.62073
[74] L. Billard and E. Diday, Symbolic regression analysis, Classification, Clustering and Data Analysis, In Proceedings of the Conference of the International Federation of Classification Societies (IFCS02), Springer, 281-288, 2002. · Zbl 1185.62129
[75] E. A. L. Neto and F. A. T. De Carvalho, Centre and range method for fitting a linear regression model to symbolic interval data, Comput Stat Data Anal 52(3) (2008), 1500-1515. · Zbl 1452.62493
[76] E. A. L. Neto and F. A. T. De Carvalho, Constrained linear regression models for symbolic interval-valued variables, Computational Statistics & Data Analysis 54(2) (2010), 333-347. · Zbl 05689593
[77] P. Teles and P. Brito, Modelling interval time series data, In Proceedings of the 3rd IASC World Conference on Computational Statistics and Data Analysis, Limassol, Cyprus, 2005. · Zbl 1342.37076
[78] A. L. S. Maia, F. A. T. De Carvalho, and T. D. Ludermir, Forecasting models for interval-valued time series, Neurocomputing 71(16-18) (2008), 3344-3352.
[79] J. Arroyo, M´etodos de Predicci´on para Series Temporales de Intervalos e Histogramas. Ph.D. Thesis, Universidad Pontif´ıcia Comillas, Madrid, Spain, 2008.
[80] C. Garc´ıa-Ascanio and C. Mat´e, Electric power demand forecasting using interval time series: a comparison between VAR and iMLP, Energy Policy 38 (2009), 715-725.
[81] J. Arroyo, G. Gonz´alez-Rivera, and C. Mat´e, Forecasting with interval and histogram data. Some financial applications, In Handbook of Empirical Economics and Finance, A. Ullah, D. Giles, N. Balakrishnan, W. Schucany, and E. Schilling, eds., Chapman and Hall/CRC, New York, 2010.
[82] G. Gonz´alez-Rivera and J. Arroyo, Time series modeling of histogram-valued data: The daily histogram time series of S&P500 intradaily returns, Int. J. Forecasting (in press).
[83] A. Han, Y. Hong, K. Lai, and S. Wang, Interval time series analysis with an application to the SterlingDollar exchange rate, J Syst Sci Complex 21(4) (2008), 558-573. · Zbl 1177.91113
[84] J. Arroyo and C. Mat´e, Forecasting histogram time series with k-nearest neighbours methods, Int J Forecast 25 (2009), 182-207.
[85] G. Birkoff, Lattice Theory, Vol. XXV (3rd ed.). American Mathematical Society Colloquium Publications, Providence, 1967.
[86] M. Barbut and B. Monjardet, Ordre et Classification, Alg‘ebre et Combinatoire, Tomes I et II, Hachette, Paris, 1970. · Zbl 0267.06001
[87] R. Wille, Restructuring lattice theory: an approach based on hierarchies of concepts, In Proceedings of the Symposium on Ordered Sets, I. Rival, ed., Dordrecht-Boston, Reidel, 1982, 445-470. · Zbl 0491.06008
[88] B. Ganter and R. Wille, Formal Concept Analysis— Mathematical Foundations, Berlin, Springer Verlag, 1999. · Zbl 0909.06001
[89] V. Duquenne and J. L. Guigues, Familles minimales d’implication informatives r´esultant d’un tableau de donn´ees binaires, Math Sci Hum 95 (1986), 5-18.
[90] G. Polaillon, Organisation et Interpr´etation par les Treillis de Galois de Donn´ees de Type Multivalu´ee, Interval ou Histogramme, Ph.D. Thesis, Universit´e Paris IX Dauphine, 1998.
[91] G. Polaillon, Interpretation and reduction of Galois lattices of complex data, In Advances in ‘Data Science and Classification’, A. Rizzi, M. Vichi, and H.-H. Bock, eds., Springer-Verlag, Berlin, 1998, 433-440. · Zbl 1052.68617
[92] G. Polaillon and E. Diday, Reduction of symbolic Galois lattices via hierarchies, In Proceedings of the Conference on Knowledge Extraction and Symbolic Data Analysis (KESDA’98), Office for Official Publications of the European Communities, Luxembourg, 1999, 137-143.
[93] P. Brito and G. Polaillon, Structuring probabilistic data by Galois lattices, Math´ematiques et Sciences Humaines Mathematics and Social Sciences, (43‘eme ann´ee) nb.169, (1), 2005, 77-104.
[94] H.-H. Bock, Probabilistic modeling for symbolic data, In COMPSTAT - Proceedings in Computational Statistics, P. Brito, ed., Heidelberg, Springer, 2008, 55-65. · Zbl 1147.62001
[95] P. Brito and A. P. Duarte Silva, Modeling interval-data with Normal and Skew-Normal distributions, In Proceedings of IASC2008, Joint Meeting of 4th World Conference of the IASC and 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, Yokohama, Japan, 2008.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.