Comparative gene finding. Models, algorithms and implementation. 2nd ed.

*(English)*Zbl 1350.92001
Computational Biology 20. London: Springer (ISBN 978-1-4471-6692-4/hbk; 978-1-4471-6693-1/ebook). xx, 382 p. (2015).

The book under review is an updated and revised version of the first edition and contains in particular a new chapter describing tools and approaches based on next generation sequencing (NGS) used for gene prediction. The book is structured into eight chapters and commences with an introduction of the required notions such as the central dogma, the structure of a gene, and common challenges in defining the location of a gene. The author also includes a historical overview of algorithms developed for gene finding. The chapter concludes with a brief description of the four steps towards building a gene finder. The second chapter discusses single species gene finding using hidden Markov models. First, Markov chains and elements of dynamic programming are introduced. Next, the forward, backward and Viterbi algorithms are discussed. As an example, the author presents EasyGene, a gene finder for prokaryotes. In the next sections, generalized hidden Markov models (GHMMs) and interpolated Markov models (IMMs) are presented. As examples, Genscan, based on GHMMs, and GLIMMER, based on IMMs, are discussed. The chapter also describes approaches based on neural networks, with GRAIL as example, decision trees, with MORGAN as example, and on conditional random fields, with Conrad as example. In the third chapter, the author reviews pairwise and multiple sequence alignments. For the former, the nucleotide and amino-acids substitution models are discussed, followed by a presentation of the Needleman-Wunsch and the Smith-Waterman algorithms. To ensure a thorough description of the multiple sequence alignments, the author commences with an overview of scoring schemes and methods for progressive alignments and other iterative approaches. Next, various methods, accompanied by examples, are discussed. As application to HMMs, the author presents SAM – sequence alignment and modelling; for genetic algorithms, the example is SAGA – sequence alignments by genetic algorithms, and for simulated annealing, the example is MSASA, which produces multiple alignments by iterative rearrangements. In the fourth chapter, the author describes comparative gene finding based either on

The structure of the book mirrors the learning steps for understanding how to perform gene finding. Although written in an accessible language, it requires some prior knowledge of the various fields it covers, such as probability theory, statistics, information theory, optimization theory and numerical analysis. Its target audience is mainly post-graduate researchers or established researchers with a background in mathematics or statistics applied in bioinformatics who need a thorough yet concise overview of this field.

- (i)
- similarity searches (GenomeScan, a GHMM-based gene finding approach using homology),
- (ii)
- heuristic cross species method (ROSETTA),
- (iii)
- pair hidden Markov models (DoubleScan),
- (iv)
- generalized pair hidden Markov models (SLAM),
- (v)
- gene mapping (GeneMapper) and
- (vi)
- multiple sequence gene finding (N-scan).

The structure of the book mirrors the learning steps for understanding how to perform gene finding. Although written in an accessible language, it requires some prior knowledge of the various fields it covers, such as probability theory, statistics, information theory, optimization theory and numerical analysis. Its target audience is mainly post-graduate researchers or established researchers with a background in mathematics or statistics applied in bioinformatics who need a thorough yet concise overview of this field.

Reviewer: Irina Ioana Mohorianu (Norwich)

##### MSC:

92-02 | Research exposition (monographs, survey articles) pertaining to biology |

92-08 | Computational methods for problems pertaining to biology |

92C40 | Biochemistry, molecular biology |

92D10 | Genetics and epigenetics |

60J22 | Computational methods in Markov chains |

92B20 | Neural networks for/in biological studies, artificial life and related topics |

90C39 | Dynamic programming |