×

Applied predictive modeling. (English) Zbl 1306.62014

New York, NY: Springer (ISBN 978-1-4614-6848-6/hbk; 978-1-4614-6849-3/ebook). xiii, 600 p. (2013).
The book under review is aimed at providing both an introduction and a practical guide of predictive modelling. It addresses to non-mathematical readers with some knowledge of basic statistics including linear regression and hypothesis testing. All concepts and methods presented are illustrated using the computational software R. There are twenty chapters in the book which are organized into four principal parts (besides the introduction in Chapter 1) dealing with general strategies (Chapters 2–4), regression models (Chapters 5–10), classification models (Chapters 11–17) and other considerations (Chapters 18–20).
Chapter 1 provides a very brief introduction into key aspects of predictive modelling, the terminology and the description of a few real data sets from different application areas of predictive modelling. After a short description of the concepts of data spending, candidate models and model selection in Chapter 2, the following Chapter 3 focus on data pre-processing including data transformation, missing data and the process of adding and removing predictors. In Chapter 4, strategies to avoid over-fitting are presented.
The second part of the book starts with a short presentation of accuracy measures for models predicting a numeric outcome in Chapter 5. After an introduction to linear regression models in Chapter 6, nonlinear regression models (e.g. neural networks, multivariate adaptive regression splines, support vector machines and K-nearest neighbors) are discussed in Chapter 7. Chapter 8 is devoted to regression trees and Chapter 9 summarizes solubility models. In Chapter 10, a case study is presented illustrating compressive strength of concrete mixtures.
Methods for building and evaluating models for categorical response are summarized in Chapter 11. Chapter 12 introduces the reader to linear discriminant analyses and other linear classification models, whereas Chapter 13 deals with nonlinear classification models. After presentation of basic concepts of classification trees as well as rule-based models in Chapter 14, grant application models are summarized in Chapter 15. The impact of class imbalance is discussed in Chapter 16. Part three of the book ends with a detailed description of a case study in Chapter 17.
The last three chapters are devoted to importance scores for numerical and categorical outcomes (Chapter 18), critical topics, typical approaches and common pitfalls of feature selection (Chapter 19) and factors influencing model performance (Chapter 20). The volume ends with an appendix providing a summary table of various models, an introduction to the computational software R and a reference list with interesting web sites.
In summary, this book is strongly recommended as a practical guide for non-mathematical readers with basic statistical knowledge. All concepts are presented within a strong practical context and are illustrated using the statistical software package R. In addition, supportive exercises are provided in each chapter.

MSC:

62-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to statistics
62-04 Software, source code, etc. for problems pertaining to statistics
62-07 Data analysis (statistics) (MSC2010)
62Jxx Linear inference, regression
62H30 Classification and discrimination; cluster analysis (statistical aspects)
00A06 Mathematics for nonmathematicians (engineering, social sciences, etc.)
PDFBibTeX XMLCite
Full Text: DOI