Wiley Series on Methods and Applications in Data Mining. Hoboken, NJ: John Wiley & Sons (ISBN 978-0-470-17643-6/hbk; 978-0-470-38286-8/ebook). xxiv, 295 p. EUR 79.60; £ 63.95 (2008).
This textbook presents a hands-on approach to text mining using Perl’s extensive regular expression capabilities. The theoretical prerequisites are kept to a minimum and only the basic text mining techniques are covered from the realms of statistics, data mining, linguistics, and information retrieval. Several specialised Perl modules are treated. For statistical calculations, the open-source package R is used. The practical examples are taken from English literary works easily available from Project Gutenberg (a brief excursion to problems with other languages is given, taking as example the German text of Goethe’s “Die Leiden des jungen Werther”).
The book is clearly written and will be useful to a wide audience ranging from undergraduate students to non-specialist professionals interested in text mining techniques.
The chapter headings are as follows: 1 Introduction; 2 Text patterns; 3 Quantitative text summaries; 4 Probability and text sampling; 5 Applying information retrieval to text mining; 6 Concordance lines and corpus linguistics; 7 Multivariate techniques with text; 8 Text clustering; 9 A sample of additional topics.