Introduction to high-dimensional statistics.

*(English)*Zbl 1341.62011
Monographs on Statistics and Applied Probability 139. Boca Raton, FL: CRC Press (ISBN 978-1-4822-3794-8/hbk; 978-1-4822-3795-5/ebook). xv, 255 p. (2015).

It is well known to statisticians that a very rich theory is provided by traditional statistics, for analyzing data with a small number of parameters \(p\) and a large number of observations \(n\). In many scientific fields, current data have very different characteristics. There are a large number of parameters \(p\) and a small number of observations \(n\). Actually, \(n\) is sometimes smaller than \(p\). In these situations, the asymptotic analysis with \(p\) fixed and \(n\) going to infinity does not work anymore. The book under consideration is a well-written book that deals with mathematical foundations of high-dimensional statistics. The word “high-dimensional” refers to the situation where the number of unknown parameters which are to be estimated is one or several orders of magnitude larger than the number of samples in the data. To analyze high-dimensional data, one must circumvent two major subjects, the first one being the intrinsic statistical difficulty related with the curse of dimensionality while the second one is the computational difficulty. The main goal of this book is to present the essential concepts of high-dimensional statistics and to delineate in some fundamental cases the frontier between what is achievable and what is impossible. It is designed as a text for a master-level course in statistics or applied mathematics.

The book contains nine chapters. In the first chapter, the author presents an introduction to his work. Model selection is the subject of Chapter 2, since it contains risk bound for model selection, optimality criterion and some computational issues related with model selection. Aggregation of estimators is the subject of Chapter 3 which contains Gibbs mixing of estimators, oracle risk bound and numerical approximation by Metropolis-Hastings. The issue of Chapter 4 are convex criteria, since this chapter contains convex multivariate functions, lasso estimator and various sparsity patterns. The issue of Chapter 5 is estimator selection. This chapter contains cross-validation techniques, complexity selection techniques and scaled-invariant criteria. Multivariate regression is the subject of Chapter 6. This chapter contains low-rank estimation, low rank and sparsity and related matrices. The issue of Chapter 7 are graphical models. Directed, nondirected and Gaussian graphical models are the essential subjects in this chapter. Multiple testing is the issue of Chapter 8. Statistical setting to \(p\)-values, multiple testing setting and Bonferroni correction are the subjects of Section 8.2 while in Section 8.3 the author deals with controlling the false discovery rate. The issue of Chapter 9 is the supervised classification. Subjects like Bayes classifier, parametric modeling, semiparametric modeling, nonparametric modeling and empirical risk minimization are contained in this chapter. A lot of discussions and exercises related to the issues of each chapter can be found in the chapters as separate sections. This book contains also five appendices, the first one related to the Gaussian distribution while the second one deals with probability inequalities. The third appendix deals with linear algebra while the issues of the fourth and fifth appendices are the subdifferentials of convex functions and reproducing kernel Hilbert spaces, respectively.

The book contains nine chapters. In the first chapter, the author presents an introduction to his work. Model selection is the subject of Chapter 2, since it contains risk bound for model selection, optimality criterion and some computational issues related with model selection. Aggregation of estimators is the subject of Chapter 3 which contains Gibbs mixing of estimators, oracle risk bound and numerical approximation by Metropolis-Hastings. The issue of Chapter 4 are convex criteria, since this chapter contains convex multivariate functions, lasso estimator and various sparsity patterns. The issue of Chapter 5 is estimator selection. This chapter contains cross-validation techniques, complexity selection techniques and scaled-invariant criteria. Multivariate regression is the subject of Chapter 6. This chapter contains low-rank estimation, low rank and sparsity and related matrices. The issue of Chapter 7 are graphical models. Directed, nondirected and Gaussian graphical models are the essential subjects in this chapter. Multiple testing is the issue of Chapter 8. Statistical setting to \(p\)-values, multiple testing setting and Bonferroni correction are the subjects of Section 8.2 while in Section 8.3 the author deals with controlling the false discovery rate. The issue of Chapter 9 is the supervised classification. Subjects like Bayes classifier, parametric modeling, semiparametric modeling, nonparametric modeling and empirical risk minimization are contained in this chapter. A lot of discussions and exercises related to the issues of each chapter can be found in the chapters as separate sections. This book contains also five appendices, the first one related to the Gaussian distribution while the second one deals with probability inequalities. The third appendix deals with linear algebra while the issues of the fourth and fifth appendices are the subdifferentials of convex functions and reproducing kernel Hilbert spaces, respectively.

Reviewer: Salah Hamza Abid (Baghdad)

##### MSC:

62-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to statistics |

62-07 | Data analysis (statistics) (MSC2010) |

62Hxx | Multivariate analysis |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

62J05 | Linear regression; mixed models |

62J15 | Paired and multiple comparisons; multiple testing |

62F10 | Point estimation |

62G05 | Nonparametric estimation |

62-09 | Graphical methods in statistics (MSC2010) |

62Pxx | Applications of statistics |