Neural network learning: Theoretical foundations. (English) Zbl 0968.68126

Cambridge: Cambridge University Press. xiv, 389 p. (1999).
The contents of this book belongs to computational learning theory. The book discusses statistical and computational problems of neural networks used for pattern recognition and prediction.
The model of lerning is derived from a model introduced by L. G. Valiant in “A theory of learnable”, Commun. ACM 27, 1134-1142 (1984; Zbl 0587.68077) and is called Probably Approximately Correct (PAC) learning. The learner receives a sequence of training examples, called labeled examples. Labeled examples are ordered pairs \((x,y)\), where \(x\) is an input to the neural network \((x\in X)\) and \(y\) is an output \((y\in Y)\). There is some probability distribution \(P\) defined on \(Z= X\times Y\). The set of all functions that the network can compute is denoted by \(H\). The error of \(h\in H\) with respect to \(P\) is \(P\{(x,y)\in Z: h(x)\neq y\}\). The PAC model is defined in the usual way, with accuracy and confidence parameters.
In the first part of the book, the largest part, a binary classification problem is discussed. To be more specific, the problem of predicting a binary value \((Y= \{0,1\})\) using a class \(H\) of binary-valued functions is studied. The authors also discuss an appropriate measure of complexity of the neural net, the Vapnik-Chervonenkis (VC) dimension. Estimates of this dimension for some classes of neural networks are provided.
In the second part of the book, the second largest, the authors discuss real classification problem, i.e., the problem of prediction a binary value \((Y= \{0,1\})\) using a class \(H\) of real-valued functions. Such problems occur in real-output neural nets used for binary classification. Here the appropriate measure of complexity is a scale-sensitive version of the VC-dimension (or fat-shattering dimension).
In part three of the book, the real prediction problem is presented. The problem is to predict a real-valued quantity \((Y={\mathbf R})\) using a class \(H\) of real-valued functions. For example, this kind of situation happens when neural nets are used for prediction of the future price of shares on the stock market. Again, the measure of complexity is the fat-shattering dimension.
The last part of the book, Part Four, is devoted to computational limitations on learning with neural nets and to estimate performance of the learning algorithms.
Bibliographic notes at the end of every chapter provide helpful comments. The book is a useful and readable monograph. For beginners it is a nice introduction to the subject, for experts a valuable reference.


68T05 Learning and adaptive systems in artificial intelligence
92B20 Neural networks for/in biological studies, artificial life and related topics
68-02 Research exposition (monographs, survey articles) pertaining to computer science
68T10 Pattern recognition, speech recognition
68Q32 Computational learning theory


Zbl 0587.68077
Full Text: DOI