Symbolic data analysis: conceptual statistics and data mining. (English) Zbl 1117.62002

Wiley Series in Computational Statistics. Hoboken, NJ: John Wiley & Sons (ISBN 978-0-470-09016-9/hbk; 978-0-470-09018-3/ebook). vii, 321 p. (2006).
Standard statistical methods don’t have the power of flexibility to analyse the fact that very large datasets have become routine, and extract the required knowledge. An alternative approach is to summarize a large dataset in such a way that the resulting summary dataset is of a manageable size and yet retains as much of the knowledge in the original dataset as possible. One consequence of this is that the data may no longer be formatted as single values, but be represented by lists, intervals,distributions, etc. The summarized data have their own internal structure, which must be taken into account in any analysis.
This monograph presents a unified account of symbolic data, how they arise, and how they are structured. The reader is introduced to symbolic data analytic methods described in a consistent statistical framework required to carry out such a summary and subsequent analysis. The book presents a detailed overwiew of the methods and applications of symbolic data analysis. It includes numerous real examples, taken from a variety of application areas, ranging from health and social sciences, to economics and computing. Feature exercises at the end of each chapter, enabling the reader to develop their understanding of the theory, are given.
Contents: 1. Introduction; 2. Symbolic Data; 3. Basic Descriptive Statistics: One Variate; 4. Descriptive Statistics: Two or More Variates; 5. Principal Components Analysis; 6. Regression Analysis; 7. Cluster Analysis. Primarily aimed at statisticians and data analysts, SDA is also ideal for scientists working on problems involving large volumes of data from a range of disciplines, including computer science, health and social sciences. There is also much of use to graduate students of statistical data analysis courses.


62-07 Data analysis (statistics) (MSC2010)
65C60 Computational problems in statistics (MSC2010)
62-02 Research exposition (monographs, survey articles) pertaining to statistics
62H25 Factor analysis and principal components; correspondence analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62Pxx Applications of statistics


Full Text: DOI