Foundational and applied statistics for biologists using R.

*(English)*Zbl 1306.62003
Boca Raton, FL: CRC Press (ISBN 978-1-4398-7338-0/hbk). xxi, 596 p. (2014).

This book is structured into two sections: (1) foundations and (2) applications. The first section contains seven chapters and presents in detail the theoretical foundations of statistics. The first chapter reviews the history and applicability of statistics discussing the scientific principles and methods, as well as ways to propose and validate hypotheses. The second chapter is built as a gentle introduction to probability describing concepts of classical and conditional probability. It also contains elements of combinatorial analysis such as the multiplication principle and notions on permutations and combinations. The third chapter is focused on probability density functions. Both discrete and continuous pdfs are discussed e.g. Poisson distribution, the hyper geometric distribution, the negative binomial (discrete pdfs) and the \(\chi^2\), the \(t\)-distribution, the \(F\) distribution, the \(\beta\) distribution and others (continuous pdfs). The chapter concludes with suggestions on how to choose an appropriate pdf for a given problem. The fourth chapter describes parameters such as expected value, the variance and the Chebyshev inequality, and statistics such as measures of location and of scale. Traditional measures of locations such as the means (arithmetic, geometric and harmonic) are presented in comparison with robust measures such as the median, the trimmed and winsorized mean or the M-estimators. For the measures of scale the author discusses the variance and coefficient of variation in comparison with robust estimators such as the interquartile range and the median absolute deviation. The chapter concludes with a description of ordinary least squares (OLS) and maximum likelihood (ML) estimators, as well as linear transformations and Bayesian applications. The fifth chapter focuses on interval estimators, discussing sampling, resampling and simulations distributions. The author builds the first section of sampling distribution of \(\bar{x}, S^2, t^*\) and \(F^*\) and links the results with the central limit theorem. Next, approaches to build confidence intervals and their interpretations are discussed. For resampling distributions the bootstrapping and the jackknife approaches are described in detail. The chapter concludes with Bayesian applications of simulation distributions and it also includes indirect simulations using a Markov chain Monte Carlo approach. The sixth chapter discusses hypothesis testing and it commences with parametric frequentist null hypothesis testing. Following a motivation for the procedure, the types of tests (i.e. upper, lower, two tailed tests) and inferences on single and two populations means are discussed. Next, the type I and II errors are presented, followed by alternatives to parametric null hypothesis testing. These include permutations tests such as Kolmogorov-Smirnov and Wilcoxon. The chapter concludes with a brief description of alternatives to null hypothesis testing such as Bayesian approaches and likelihood-based approaches. The seventh chapter presents sampling and experimental design. It commences with definitions for the terminology used and discusses the types of questions and approaches: randomization and replication. Next, the sampling design is presented in detail with respect to the two approaches mentioned above. The adjustment of estimators to account for sampling, time series models and pseudo replications is also discussed. The chapter concludes with an overview of experimental design with approaches varying from manipulative experiments and observational studies to regression, ANOVA and tabular designs.

The second part of the book focuses on applications and it commences with the eighth chapter dedicated to the analysis of correlations. Pearson’s correlation is discussed in detail, followed by robust correlations such as Spearman and Kendall. The robust estimator approaches like the winsorized correlation are also discussed. The ninth chapter presents regression and contains detailed description of linear and generalised linear models (including parameter estimations and hypothesis testing). It encompasses notions on multiple regression for which the ANOVA approach is discussed. The comparative analysis of fitted and predicted values as well as the identification of confidence and prediction intervals are presented. Next, the assumptions and diagnostics for linear regression are overviewed and are followed by approaches to select optimal models. The chapter concludes with notions on generalised linear models, nonlinear models and smoother approaches to association and regression. In the tenth chapter the author discusses ANOVA and presents in detail the identification of suitable contrasts, the analysis of random effects and ANOVA as a tool for diagnostics and assumption testing. Different designs such as the two-way factorial, randomized block, nested or split-plot design are discussed. The chapter concludes with an overview of robust ANOVA and a description of one-way and multi-way ANOVA. The eleventh chapter is built on tabular analyses and it presents in detail the analysis of one-way formats, discussing Score, Wald and likelihood tests. The constriction of confidence intervals is also presented and is followed by an in depth overview of contingency tables with a focus on two-way tables. The chapter concludes with a GLM analysis in the context of tabular analysis.

The book is written in an accessible style for undergraduate students and is built in a lecture style format. Each chapter commences with a brief description of its contents in the “how to read this chapter” section and finishes with a summary and exercises to help the students practice using the notions that they discovered within the chapter. The book also contains an extensive appendix of mathematical concepts and has the advantage of a large collection of references which can be accessed for further studies.

The second part of the book focuses on applications and it commences with the eighth chapter dedicated to the analysis of correlations. Pearson’s correlation is discussed in detail, followed by robust correlations such as Spearman and Kendall. The robust estimator approaches like the winsorized correlation are also discussed. The ninth chapter presents regression and contains detailed description of linear and generalised linear models (including parameter estimations and hypothesis testing). It encompasses notions on multiple regression for which the ANOVA approach is discussed. The comparative analysis of fitted and predicted values as well as the identification of confidence and prediction intervals are presented. Next, the assumptions and diagnostics for linear regression are overviewed and are followed by approaches to select optimal models. The chapter concludes with notions on generalised linear models, nonlinear models and smoother approaches to association and regression. In the tenth chapter the author discusses ANOVA and presents in detail the identification of suitable contrasts, the analysis of random effects and ANOVA as a tool for diagnostics and assumption testing. Different designs such as the two-way factorial, randomized block, nested or split-plot design are discussed. The chapter concludes with an overview of robust ANOVA and a description of one-way and multi-way ANOVA. The eleventh chapter is built on tabular analyses and it presents in detail the analysis of one-way formats, discussing Score, Wald and likelihood tests. The constriction of confidence intervals is also presented and is followed by an in depth overview of contingency tables with a focus on two-way tables. The chapter concludes with a GLM analysis in the context of tabular analysis.

The book is written in an accessible style for undergraduate students and is built in a lecture style format. Each chapter commences with a brief description of its contents in the “how to read this chapter” section and finishes with a summary and exercises to help the students practice using the notions that they discovered within the chapter. The book also contains an extensive appendix of mathematical concepts and has the advantage of a large collection of references which can be accessed for further studies.

Reviewer: Irina Ioana Mohorianu (Norwich)

##### MSC:

62-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to statistics |

00A06 | Mathematics for nonmathematicians (engineering, social sciences, etc.) |

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

92B15 | General biostatistics |