Graphics of large datasets. Visualizing a million.

*(English)*Zbl 1118.62003
Statistics and Computing (Cham). New York, NY: Springer (ISBN 0-387-32906-4/hbk). xiii, 271 p. (2006).

The book shows how to look at ways of visualising large datasets, where large may refer either to the number of cases, or to the number of variables or to both. Data visualisation is useful for data cleaning, exploring data, identifying trends and clusters, spotting local patterns, evaluating modelling output, and presenting results. It is also essential for exploratory data analysis and data mining. Given its subject matter, the book is well addressed to data analysts, statisticians, computer scientists, and anyone who has to explore a large dataset.

The book consists of 11 chapters. The first chapter is an introductory one, while the remaining chapters are grouped in two parts. The first part consisting of chapters 2 to 4 defines and discusses standard statistical graphics. They also look at the issue of upscaling graphics to cope with large datasets. Finally, several developments of interactive methods are explained that improve the capability of graphics for finding information in large datasets. After these three introductory chapters of part I, part II follows with chapters from 5 to 11 that focus on particular types pf graphics for applications. More specifically, chapter 5 discusses specialist plots for multivariate categorical data, while chapters 6 and 7 consider multivariate continuous data. Chapters 8 and 9 discuss visualisation of structures like networks and trees. Throughout the book, so far, several applications are presented and discussed. Chapter 10, however, is devoted completely to a major application that concerns internet packet data. Finally, chapter 11 uses an example to illustrate how a large dataset can be explored with visualisation.

The book consists of 11 chapters. The first chapter is an introductory one, while the remaining chapters are grouped in two parts. The first part consisting of chapters 2 to 4 defines and discusses standard statistical graphics. They also look at the issue of upscaling graphics to cope with large datasets. Finally, several developments of interactive methods are explained that improve the capability of graphics for finding information in large datasets. After these three introductory chapters of part I, part II follows with chapters from 5 to 11 that focus on particular types pf graphics for applications. More specifically, chapter 5 discusses specialist plots for multivariate categorical data, while chapters 6 and 7 consider multivariate continuous data. Chapters 8 and 9 discuss visualisation of structures like networks and trees. Throughout the book, so far, several applications are presented and discussed. Chapter 10, however, is devoted completely to a major application that concerns internet packet data. Finally, chapter 11 uses an example to illustrate how a large dataset can be explored with visualisation.

Reviewer: Christina Diakaki (Chania)