Introduction to information retrieval. (English) Zbl 1160.68008

Cambridge: Cambridge University Press (ISBN 978-0-521-86571-5/hbk; 978-0-511-41080-2/ebook). xxi, 482 p. (2008).
This is an extraordinary textbook, rather voluminous and comprehensive. It deals with many themes in the area of information retrieval. Nevertheless, the authors take the text as an introduction, and offer a special list of references to textbooks concerning ten topics outside the scope of their book, and pointers to further reading at the end of each chapter. The textbook is the result of a series of courses at Stanford University and at the University of Stuttgart, and is designed primarily for a graduate course in information retrieval. Each chapter contains exercises with indication of their level of difficulty. The prerequisites are basic knowledge of data structures and algorithms, linear algebra, probability theory, nonlinear optimization, matrices, eigenvalues and eigenvectors. Some rudimentary facts are reviewed in the initial paragraphs of the corresponding chapters.
The book consists of 21 chapters, 8 of which are considered as a foundation (indexes, Boolean queries, document preprocessing, dictionaries and spelling errors correction, compressing dictionaries and indexes, vector space model for scoring, evaluation of information retrieval systems concerning the relevance of retrieved documents). Then the more advanced problems are studied (relevance feedback, query reformulation, XML retrieval, probabilistic information retrieval, text classification, repeated (“standing”) queries, machine learning methods, document clustering). The last three chapters are devoted to the very important and topical area of web search (summary of challenges, architecture and requirements, web crawling and indexes, link analysis for ranking web search results).
A detailed characterization of the individual chapters would go beyond a reasonable scope of this review. Therefore I have written only this brief survey, but I invite all those who are interested in information retrieval definitely not to miss this book. Each one of you will surely find quite a lot of interesting stuff in the text of more than 400 pages, in the referenced literature (the bibliography occupies 27 pages) and/or on the rich web site accompanying the book (http://informationretrieval.org).


68P20 Information storage and retrieval of data
68-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to computer science
Full Text: DOI