##
**Brief overview of symbolic data and analytic issues.**
*(English)*
Zbl 07260274

Summary: With the advent of contemporary computers, datasets can be massively huge, too large for direct analysis. One of the many approaches to this problem of size is to aggregate the data according to some appropriate scientific question of interest, with the resulting dataset perforce being one with symbolic-valued observations such as lists, intervals, histograms, and the like. Other datasets, small or large, are naturally symbolic in nature. One aim here is to provide a brief nontechnical overview of symbolic data and discuss how they arise. We also provide brief insights into some of the issues that arise in their analyses. These include the need to take into account the internal variations inherent in symbolic data but not present in classical data. Another issue is that, by the nature of the aggregation, resulting datasets can contain “holes” or regions that are not possible; thus, accommodation for these need to be taken into account, when, e.g. seemingly interval data are actually some other form of symbolic data (such as histogram data). Also, we show how other forms of complex data differ from symbolic data; so, e.g. fuzzy data are a different domain than that for symbolic data. Finally, we look at further research needs for the subject. A more technical introduction to symbolic data and available analytic methodology is given by Noirhomme and Brito.

### Keywords:

aggregated data; large datasets; interval data; histogram data; multi-modal data; symbolic data analysis; internal variation; rules; complex data### Software:

SODAS
PDF
BibTeX
XML
Cite

\textit{L. Billard}, Stat. Anal. Data Min. 4, No. 2, 149--156 (2011; Zbl 07260274)

Full Text:
DOI

### References:

[1] | M. Noirhomme and M. P. Brito, Far beyond the classical data models: symbolic data analysis, Stat Anal Data Mining 4 (2011), 157-170. |

[2] | J. W. Tukey, Exploratory Data Analysis, Reading, MA: Addison-Wesley, 1977. · Zbl 0409.62003 |

[3] | L. Billard and E. Diday, Symbolic Data Analysis: Conceptual Statistics and Data Mining, Chichester: Wiley, 2006. · Zbl 1117.62002 |

[4] | H.-H. Bock and E. Diday, Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, Berlin: Springer-Verlag, 2000. · Zbl 1039.62501 |

[5] | L. Billard and E. Diday, From the statistics of data to the statistics of knowledge: Symbolic data analysis, J Am Stat Assoc 98 (2003), 470-487. |

[6] | E. Diday and M. Noirhomme, eds., Symbolic Data Analysis and the SODAS Software, Chichester: Wiley, 2008. · Zbl 1275.62029 |

[7] | L. Billard, Sample covariance functions for complex quantitative data, In World Congress, International Association of Computational Statistics, Yokohama, Japan, 2008. |

[8] | P. Bertrand and F. Goupil, Descriptive statistics for symbolic data, In Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, H.-H. Bock and E. Diday, eds., Berlin: Springer-Verlag, 2000, 103-124. · Zbl 0978.62005 |

[9] | A. Douzal-Chouakria, L. Billard, and E. Diday, Principal component analysis for interval-valued observations, Stat Anal Data Mining 4 (2011), 229-246. |

[10] | L. Billard and E. Diday, Descriptive statistics for intervalvalued observations in the presence of rules, Comput Stat 21 (2006), 187-210. · Zbl 1114.62003 |

[11] | J. Le-Rademacher and L. Billard, Likelihood functions and some maximum likelihood estimators for symbolic data, J Stat Plann Infer 141 (2011), 1593-1602. · Zbl 1204.62026 |

[12] | L. A. Zadeh, Fuzzy sets, Inform Control 8 (1965), 338-353. · Zbl 0139.24606 |

[13] | E, Diday, Probabilist, possibilist and belief objects for knowledge analysis, Ann Oper Res 55 (1995), 227-276. · Zbl 0844.68024 |

[14] | E. Diday, R. Emilion, and Y. Hillali, Symbolic data analysis of probabilistic objects by capacities and credibilities, Atti della XXXVIII Riunione Societa Italina di Statistictic‘a, Rimini, 1996, 5-22. |

[15] | R. Emilion, Differentiation des capaciti´es, C R Acad Sci I - Mathematics 324 (1997), 389-392. · Zbl 0885.28002 |

[16] | E. Diday and R. Emilion, Lattices and capacities in analysis of probabilist objects, In Studies in Classification, E. Diday, Y. Lechevallier, and O. Opilz, eds., 1996, 13-30. · Zbl 0904.62002 |

[17] | E. Diday and R. Emilion, Capacities and credibilities in analysis of probabilistic objects by histograms and lattices, In Data Science, Classification, and Related Methods, C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H.-H. Bock, and Y. Baba, eds., 1998, 353-357. · Zbl 0894.62007 |

[18] | E. Diday and R. Emilion, Maximal and stochastic Galois lattices, Discrete Appl Math 127 (2003), 271-284. · Zbl 1026.06009 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.