Information Science and Statistics. New York, NY: Springer (ISBN 0-387-31073-8/hbk). xx, 738 p. EUR 59.95; £ 46.00; $ 74.95 (2006).

The new textbook (738 pages, 409 references) reflects substantial developments over the past decade in Bayesian, probabilistic, graphical, clustering, and other mathematical models and exposes new methods and algorithms which are used in computer science (machine learning) and engineering (pattern recognition) applications. The emphasis is on conveying the underlying concepts for a clear understanding of modern pattern recognition and machine learning techniques.
The book is structured into 14 main parts and 5 appendices. 1. Introduction (polynomial curve fitting, probability theory, model selection, decision theory, information theory). 2. Probability distributions (binary variables, multinomial variables, Gaussian distribution, the exponential family, non-parametric methods). 3. Linear models for regression (linear basis function models, the bias-variance decomposition, Bayesian linear regression, Bayesian model comparison, the evidence approximation, limitations of fixed basis functions). 4. Linear models for classification (discriminant functions, probabilistic generative models, probabilistic discriminative models, the Laplace approximation, Bayesian logistic regression). 5. Neural networks (feed-forward network functions, network training, error back-propagation, the Hessian matrix, regularisation, mixture density networks, Bayesian neural networks). 6. Kernel methods (dual representations, constructing kernels, radial basis function networks, Gaussian processes). 7. Sparse kernel machines (maximum margin classifiers, relevance vector machines). 8. Graphical models (Bayesian networks, conditional independence, Markov random fields, inference in graphical models). 9. Mixture models and expectation maximisation (K-means clustering, mixtures of Gaussians, an alternative view of expectation maximisation). 10. Approximate inference (variational inference, variational linear regression, exponential family distributions, variational logistic regression, expectation propagation). 11. Sampling methods (basic sampling algorithms, Markov chain Monte-Carlo, Gibbs sampling, slice sampling, the hybrid Monte-Carlo algorithm, the partition function estimation). 12. Continuous latent variables (principal component analysis (PCA), probabilistic PCA, kernel PCA, nonlinear latent variable models). 13. Sequential data (Markov models, hidden Markov models, linear dynamic systems). 14. Combining models (Bayesian model averaging, committees, boosting, tree-based models, conditional mixture models). Appendices: data sets; probability distributions; properties of matrices; calculus of variations; Lagrange multipliers.
The book is aimed at PhD students, researchers and practitioners. It is well-suited for courses on machine learning, statistics, computer science, signal processing, computer vision, data mining, and bio-informatics. Extensive support is provided for course instructors, including more than 400 exercises, lecture slides and a great deal of additional material available at the book’s web site \url{http://research.microsoft.com/ cmbishop/PRML}.