Linear algebra and optimization for machine learning. A textbook.

*(English)*Zbl 1451.68002
Cham: Springer (ISBN 978-3-030-40343-0/hbk; 978-3-030-40344-7/ebook). xxi, 495 p. (2020).

This textbook introduces linear algebra and optimization in the context of statistical machine learning. Examples and exercises are provided throughout the book. The book is composed of eleven chapters. Each chapter is followed by an additional large number of exercises.
It gives the required mathematical background in machine learning, such as deep learning, support vector machines, kernel machines, clustering and dimension reduction, recommender systems and information retrieval methods.

The first chapter introduces vectors and matrices and their basic operations as well as their relation to machine learning. The second part of the chapter deals with the idea of optimization of cost functions in machine learning. The second chapter gives a basic introduction to linear algebra. Some examples highlight the question what a linear transformation is. Linear transformation by matrix multiplication is explained by several examples. Machine learning of a linear transformation is demonstrated by discrete wavelet transformation and later by the cosine transformation. It is indicated how to solve linear equations in relation to the optimization task, then how to solve inconsistent linear systems with the aid of the Moore-Penrose pseudoinverse. The problems of ill-conditioned systems is discussed. An example of a linear transformation that is based on complex numbers is the discrete Fourier transformation. The third chapter introduces eigenvectors and eigenvalues, the diagonalization of matrices and their applications to matrix factorization like, for example, the Cholesky factorization. Then, norm-constrained quadratic programming as well as numerical algorithms for finding eigenvectors are introduced. Chapter four describes basic optimization in machine learning. Gradient descent finds the minima of a cost/loss/error function. This loss function that can be defined by a squared error has in linear cases a closed solution. However, in machine learning most cases are not linear and a closed solution does not exist. A solution is found by the gradient descent method. In machine learning, generalization is an important possibility of a learning system described by regularization. The loss function can correspond to the squared error or to the cross entropy. Examples of linear regression and support vector machines are demonstrated. Chapter five deals with advanced optimization methods that are essential for deep learning, such as momentum-based learning, AdaGrad RMSProp and Adam. The Newton method is introduced, and the resulting constraints related to the saddle points are discussed. Then the techniques for non-differentiable functions, such as the subgradient method, are discussed. Chapter six introduces constrained optimization and duality, on which the learning of support vector machines is based. Chapter seven deals with dimension reduction, such as principal component analysis and singular value decomposition. Chapter eight introduces methods for matrix factorization with examples of recommender systems. In Chapter nine, similarity matrices and kernels are introduced. Chapter ten deals with linear algebra on graphs as used in spectral clustering and the PageRank algorithm. Chapter eleven introduces optimization in computational graphs and its application in artificial neural networks, using the backpropagation algorithm. The chapters are followed by a bibliography and an index.

The book is recommended to everyone in the field of machine learning. For beginners in machine learning it introduces the background required in linear algebra and optimization. For professionals in machine learning it serves as a compact reference. The book is clearly and nicely written, it is a joy to read it.

The first chapter introduces vectors and matrices and their basic operations as well as their relation to machine learning. The second part of the chapter deals with the idea of optimization of cost functions in machine learning. The second chapter gives a basic introduction to linear algebra. Some examples highlight the question what a linear transformation is. Linear transformation by matrix multiplication is explained by several examples. Machine learning of a linear transformation is demonstrated by discrete wavelet transformation and later by the cosine transformation. It is indicated how to solve linear equations in relation to the optimization task, then how to solve inconsistent linear systems with the aid of the Moore-Penrose pseudoinverse. The problems of ill-conditioned systems is discussed. An example of a linear transformation that is based on complex numbers is the discrete Fourier transformation. The third chapter introduces eigenvectors and eigenvalues, the diagonalization of matrices and their applications to matrix factorization like, for example, the Cholesky factorization. Then, norm-constrained quadratic programming as well as numerical algorithms for finding eigenvectors are introduced. Chapter four describes basic optimization in machine learning. Gradient descent finds the minima of a cost/loss/error function. This loss function that can be defined by a squared error has in linear cases a closed solution. However, in machine learning most cases are not linear and a closed solution does not exist. A solution is found by the gradient descent method. In machine learning, generalization is an important possibility of a learning system described by regularization. The loss function can correspond to the squared error or to the cross entropy. Examples of linear regression and support vector machines are demonstrated. Chapter five deals with advanced optimization methods that are essential for deep learning, such as momentum-based learning, AdaGrad RMSProp and Adam. The Newton method is introduced, and the resulting constraints related to the saddle points are discussed. Then the techniques for non-differentiable functions, such as the subgradient method, are discussed. Chapter six introduces constrained optimization and duality, on which the learning of support vector machines is based. Chapter seven deals with dimension reduction, such as principal component analysis and singular value decomposition. Chapter eight introduces methods for matrix factorization with examples of recommender systems. In Chapter nine, similarity matrices and kernels are introduced. Chapter ten deals with linear algebra on graphs as used in spectral clustering and the PageRank algorithm. Chapter eleven introduces optimization in computational graphs and its application in artificial neural networks, using the backpropagation algorithm. The chapters are followed by a bibliography and an index.

The book is recommended to everyone in the field of machine learning. For beginners in machine learning it introduces the background required in linear algebra and optimization. For professionals in machine learning it serves as a compact reference. The book is clearly and nicely written, it is a joy to read it.

Reviewer: Andreas Wichert (Lisboa)

##### MSC:

68-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to computer science |

15-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to linear algebra |

62-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to statistics |

90-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to operations research and mathematical programming |

68T05 | Learning and adaptive systems in artificial intelligence |

68T07 | Artificial neural networks and deep learning |

90C90 | Applications of mathematical programming |