Model-based clustering and classification for data science. With applications in R. (English) Zbl 1436.62006

Cambridge Series in Statistical and Probabilistic Mathematics 50. Cambridge: Cambridge University Press (ISBN 978-1-108-49420-5/hbk; 978-1-108-64418-1/ebook). xvii, 427 p. (2019).
This text-book lives from its many elucidating examples. They are used to explain the used clustering and classification models before giving the strict mathematical formulas.
The ten main chapters of this book have the following heads: model-based clustering; dealing with difficulties; model-based classification; semi-supervised clustering and classification; discrete data clustering; variable selection; high-dimensional data; non-Gaussian model-based clustering; network data; model-based clustering with covariates. Each chapter ends with bibliographic notes. An extended reference list is given at the end of the book.
Publisher’s description: “Cluster analysis finds groups in data automatically. Most methods have been heuristic and leave open such central questions as: how many clusters are there? Which method should I use? How should I handle outliers? Classification assigns new observations to groups given previously classified observations, and also has open questions about parameter tuning, robustness and uncertainty assessment. This book frames cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions. It builds the basic ideas in an accessible but rigorous way, with extensive data examples and R code; describes modern approaches to high-dimensional data and networks; and explains such recent advances as Bayesian regularization, non-Gaussian model-based clustering, cluster merging, variable selection, semi-supervised and robust classification, clustering of functional data, text and images, and co-clustering. Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics.
Extensive use of real-world examples – with data, code and color graphics – builds intuition and understanding; R package MBCbook available on CRAN allows replication of analyses; this up-to-date account by four leading researchers gives access to powerful, state-of-the-art methods.”


62-02 Research exposition (monographs, survey articles) pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62R07 Statistical aspects of big data and data science


MBCbook; R
Full Text: DOI