Bioinformatics and phylogenetics. Seminal contributions of Bernard Moret.

*(English)*Zbl 1429.92003
Computational Biology 29. Cham: Springer (ISBN 978-3-030-10836-6/hbk; 978-3-030-10837-3/ebook). xxv, 410 p. (2019).

This book is a collection of fifteen chapters, each focusing on one aspect of phylogenetic analyses. The style of the book is comprehensive, balancing the thorough explanation of fundamental concepts with suggestions and references that support further research.

The first four chapters focus on likelihood calculations; in Chapter 1, the author reviews the approaches for optimising phylogenetic likelihood calculations focusing on sequential PLF via algorithmic means and technical means. Also discussed are partial and full terraces in the tree space, as well as details related to the parallelisation of computational aspects. In the second chapter, the author presents matrix-based calculations of the likelihood function and its applications on the pruning algorithm using only vector operations, on inferring edge lengths, and on one-length and all-length optimisations. In the third chapter, the author discussess high-performance, faster, likelihood-based calculations and the resilience of the approaches to multi-mode parallelisation. In the next chapter, the author presents yet another angle on the maximum likelihood-based estimation, one that takes into account sequence-length requirements. The problem of optimising the branch lengths and balancing depth versus branching are discussed in detail.

In the fifth chapter, the author analyses the gene family evolution, from an algorithmic perspective. Following an overview on binary and non-binary tree reconciliation, the inference of a gene tree from a set of trees, using amalgamation and the super-tree method, are also presented. In the next chapter, the author presents a series of approaches based on the divide-and-conquer estimations of super-trees: the bipartition-based, quartet-based and distance-based methods. The accuracy and scalability of super-tree methods and approaches to improve these (e.g. the SuperFine implementation) are also discussed.

In the seventh chapter, the author presents methods for constructing taxonomic super-trees focusing the examples on Incertae sedis. The focus is on rooted splits and split-based semantics coupled with examples that highlight a pipeline based on subproblem decomposition. In the eighth chapter, the aim is to present the changes in interpretation of the evolutionary rate change led by the transformation from additive to ultra-metric (Farris transform, non-parametric rate smoothing and penalised likelihood). The examples include fish, Solanacae and Malvaceae. Continuing the trend, in the ninth chapter, the author presents approaches for the reconstruction of ancestral genomes focusing on the difficulties in solving the median problem when multiple genomes are involved.

With the tenth chapter, the author approaches yet another direction, the genome rearrangement problem for genes with single and multiple copies; for the former, that standard distances such as breakpoint, reversal and translocation, DCJ and ScoJ are presented, for the latter, the concept of polyploidy and the models with indels, duplications and deletions, are discussed in detail. All examples focus on cancers. The study of cancer phylogenetics continues in Chapter 11 where the author presents approaches for estimating evolutionary distances based on single nucleotide variants (SNVs) and copy number aberrations (CNAs) and incorporating these into the Steiner tree problems; applications of deconvolution methods on SNVs and CNAs conclude the chapter.

The twelfth chapter also focuses on integrating phylogenies into networks, namely the identification of clusters in rooted phylogenetic networks. Particular examples on the node visibility property, on reticulation visible networks and on nearly stable networks are also included. In the next chapter, the author overviews the computational difficulties in relation to the identification and description of phylogenetic networks in the presence of hybridisation. Following a discussion on approaches for displaying and summarising networks, the generative models based on sequences and approaches based on maximum likelihood approaches and Bayesian Inference are discussed in detail.

In the fourteenth chapter, the author overviews, side-by-side, comparative and functional genomic approaches with examples on seven Amniote genomes. In the last chapter, the author puts forward yet another angle based on integer linear programming. Standard problems such as the handling of high-density subgraphs, the identification of a maximum clique and of the maximum independent set are discussed from the ILP perspective. The author also embeds the traveling salesman problem into biological problems e.g. the DNA assembly problem and the marker-ordering problem.

Although written in an accessible language, the book is mainly aimed at researchers with pre-existing knowledge of phylogenetics and computational approaches related to it (e.g. graph theory). The numerous examples make this collection of chapters comprehensible for undergraduate, graduate and postgraduate researchers from computer science, mathematics and biology; the extensive set of references that accompany every chapter recommend the book as a reliable starting point for further studies.

The first four chapters focus on likelihood calculations; in Chapter 1, the author reviews the approaches for optimising phylogenetic likelihood calculations focusing on sequential PLF via algorithmic means and technical means. Also discussed are partial and full terraces in the tree space, as well as details related to the parallelisation of computational aspects. In the second chapter, the author presents matrix-based calculations of the likelihood function and its applications on the pruning algorithm using only vector operations, on inferring edge lengths, and on one-length and all-length optimisations. In the third chapter, the author discussess high-performance, faster, likelihood-based calculations and the resilience of the approaches to multi-mode parallelisation. In the next chapter, the author presents yet another angle on the maximum likelihood-based estimation, one that takes into account sequence-length requirements. The problem of optimising the branch lengths and balancing depth versus branching are discussed in detail.

In the fifth chapter, the author analyses the gene family evolution, from an algorithmic perspective. Following an overview on binary and non-binary tree reconciliation, the inference of a gene tree from a set of trees, using amalgamation and the super-tree method, are also presented. In the next chapter, the author presents a series of approaches based on the divide-and-conquer estimations of super-trees: the bipartition-based, quartet-based and distance-based methods. The accuracy and scalability of super-tree methods and approaches to improve these (e.g. the SuperFine implementation) are also discussed.

In the seventh chapter, the author presents methods for constructing taxonomic super-trees focusing the examples on Incertae sedis. The focus is on rooted splits and split-based semantics coupled with examples that highlight a pipeline based on subproblem decomposition. In the eighth chapter, the aim is to present the changes in interpretation of the evolutionary rate change led by the transformation from additive to ultra-metric (Farris transform, non-parametric rate smoothing and penalised likelihood). The examples include fish, Solanacae and Malvaceae. Continuing the trend, in the ninth chapter, the author presents approaches for the reconstruction of ancestral genomes focusing on the difficulties in solving the median problem when multiple genomes are involved.

With the tenth chapter, the author approaches yet another direction, the genome rearrangement problem for genes with single and multiple copies; for the former, that standard distances such as breakpoint, reversal and translocation, DCJ and ScoJ are presented, for the latter, the concept of polyploidy and the models with indels, duplications and deletions, are discussed in detail. All examples focus on cancers. The study of cancer phylogenetics continues in Chapter 11 where the author presents approaches for estimating evolutionary distances based on single nucleotide variants (SNVs) and copy number aberrations (CNAs) and incorporating these into the Steiner tree problems; applications of deconvolution methods on SNVs and CNAs conclude the chapter.

The twelfth chapter also focuses on integrating phylogenies into networks, namely the identification of clusters in rooted phylogenetic networks. Particular examples on the node visibility property, on reticulation visible networks and on nearly stable networks are also included. In the next chapter, the author overviews the computational difficulties in relation to the identification and description of phylogenetic networks in the presence of hybridisation. Following a discussion on approaches for displaying and summarising networks, the generative models based on sequences and approaches based on maximum likelihood approaches and Bayesian Inference are discussed in detail.

In the fourteenth chapter, the author overviews, side-by-side, comparative and functional genomic approaches with examples on seven Amniote genomes. In the last chapter, the author puts forward yet another angle based on integer linear programming. Standard problems such as the handling of high-density subgraphs, the identification of a maximum clique and of the maximum independent set are discussed from the ILP perspective. The author also embeds the traveling salesman problem into biological problems e.g. the DNA assembly problem and the marker-ordering problem.

Although written in an accessible language, the book is mainly aimed at researchers with pre-existing knowledge of phylogenetics and computational approaches related to it (e.g. graph theory). The numerous examples make this collection of chapters comprehensible for undergraduate, graduate and postgraduate researchers from computer science, mathematics and biology; the extensive set of references that accompany every chapter recommend the book as a reliable starting point for further studies.

Reviewer: Irina Ioana Mohorianu (Oxford)

##### MSC:

92-06 | Proceedings, conferences, collections, etc. pertaining to biology |

00B15 | Collections of articles of miscellaneous specific interest |

92D15 | Problems related to evolution |

92C42 | Systems biology, networks |

90C10 | Integer programming |