Exploratory data analysis with Matlab.

*(English)*Zbl 1067.62005
Chapman & Hall/CRC Computer Science & Data Analysis Series. Boca Raton, FL: Chapman & Hall/CRC (ISBN 1-58488-366-9/hbk). xv, 405 p. (2005).

The purpose of this book is to promote the key concepts and methods of computational statistics focusing on exploratory data analysis (EDA) and to make EDA techniques available to a wide range of users: statisticians, engineers, data miners, computer- biology- linguistics- social- etc. scientists. The book can also be useful in a classroom setting at the senior undergraduate and graduate level, valuable exercises being included in each chapter. The book is organized into three parts, composed of ten chapters, and ending with five appendices.

Part I encloses only the EDA introductory Chapter 1. Part II is dedicated to EDA as Pattern Discovery (Chapters 2–7). There are discussed linear and nonlinear dimensionality reduction, and classical techniques such as principal components analysis, factor analysis, multidimensional scaling, as well as computational methods akin to self-organizing maps, locally linear embedding, isometric feature mapping, and generative topographic maps. EDA methods that ‘tour’ the data, looking for interesting (including multidimensional) structures (such as holes, outliers, clusters) are described. Clustering or unsupervised learning methods and scatterplot smoothing techniques are also analyzed and illustrated.

Part III (Graphical Methods for EDA) discusses many of the standard techniques of visualization for EDA. Classical and novel ways of visualizing the results of the cluster process are devised: dendrograms, treemaps, rectangle plots, and ReClus. These visualization procedures can be used to assess the output from various clustering algorithms that were covered in Part II of the book. Distribution shapes obtained by using boxplots, bagplots, \(q-q\) plots, histograms, etc., can tell important things about the phenomena that produced the data. Finally, methods to visualize multivariate data are presented, including parallel coordinate plots, scatterplot matrices, glyph plots, coplots, dot charts, and Andrew’s curves. Standard methods for interacting with the plot to uncover structure or patterns are expounded, e.g., linking and brushing.

MATLAB Versions 6.5 and 7, and MATLAB Statistics Toolbox Versions 4 and 5 have been used to implement the ideas in concrete software programs. Much of the code used in the examples and to create the figures is freely available, either as part of the downloadable toolbox included with the book or on other Internet sites. Further helpful information is enclosed in the Appendices A - E, especially B for EDA software resources.

Part I encloses only the EDA introductory Chapter 1. Part II is dedicated to EDA as Pattern Discovery (Chapters 2–7). There are discussed linear and nonlinear dimensionality reduction, and classical techniques such as principal components analysis, factor analysis, multidimensional scaling, as well as computational methods akin to self-organizing maps, locally linear embedding, isometric feature mapping, and generative topographic maps. EDA methods that ‘tour’ the data, looking for interesting (including multidimensional) structures (such as holes, outliers, clusters) are described. Clustering or unsupervised learning methods and scatterplot smoothing techniques are also analyzed and illustrated.

Part III (Graphical Methods for EDA) discusses many of the standard techniques of visualization for EDA. Classical and novel ways of visualizing the results of the cluster process are devised: dendrograms, treemaps, rectangle plots, and ReClus. These visualization procedures can be used to assess the output from various clustering algorithms that were covered in Part II of the book. Distribution shapes obtained by using boxplots, bagplots, \(q-q\) plots, histograms, etc., can tell important things about the phenomena that produced the data. Finally, methods to visualize multivariate data are presented, including parallel coordinate plots, scatterplot matrices, glyph plots, coplots, dot charts, and Andrew’s curves. Standard methods for interacting with the plot to uncover structure or patterns are expounded, e.g., linking and brushing.

MATLAB Versions 6.5 and 7, and MATLAB Statistics Toolbox Versions 4 and 5 have been used to implement the ideas in concrete software programs. Much of the code used in the examples and to create the figures is freely available, either as part of the downloadable toolbox included with the book or on other Internet sites. Further helpful information is enclosed in the Appendices A - E, especially B for EDA software resources.

Reviewer: Neculai Curteanu (Iaşi)

##### MSC:

62-07 | Data analysis (statistics) (MSC2010) |

62-02 | Research exposition (monographs, survey articles) pertaining to statistics |

65C60 | Computational problems in statistics (MSC2010) |

62-09 | Graphical methods in statistics (MSC2010) |

93C57 | Sampled-data control/observation systems |

62-04 | Software, source code, etc. for problems pertaining to statistics |