On computing the nearest neighbor interchange distance.

*(English)*Zbl 1133.92347
Du, Ding-Zhu (ed.) et al., Discrete mathematical problems with medical applications. DIMACS workshop, DIMACS Center, Princeton, NJ, USA, December 8–10, 1999. Providence, RI: AMS, American Mathematical Society (ISBN 0-8218-2096-6). DIMACS, Ser. Discrete Math. Theor. Comput. Sci. 55, 125-143 (2000).

Summary: In the practice of molecular evolution, different phylogenetic trees for the same group of species are often produced either by procedures that use diverse optimality criteria or from different genes. Comparing these trees to find their similarities (e.g., agreement or consensus) and dissimilarities, i.e., distance, is thus an important issue in computational molecular biology. The nearest neighbor interchange (nni) distance is a natural distance metric that has been extensively studied. Despite its many appealing aspects such as simplicity and sensitivity to tree topologies, computing this distance has remained very challenging, and many algorithmic and complexity issues about computing this distance have remained unresolved.

This paper studies the complexity and efficient approximation algorithms for computing the nni distance and a natural extension of this distance on weighted phylogenies. The following results answer many open questions about the nni distance posed in the literature. 1. Computing the nni distance between two labeled trees is NP-complete. This solves a 25 year old open question appearing again and again. 2. Computing the nni distance between two unlabeled trees is also NP-complete. This answers an open question of K. Culik and D. Wood [Inf. Process. Lett. 15, 39–42 (1982; Zbl 0489.68058)] for which an erroneous proof appeared by M. Krivanek [J. Classif. 3, 55–60 (1986; Zbl 0594.68058)].

3. Biological applications motivate us to extend the nni distance to weighted phylogenies, where edge weights indicate the time-span of evolution along each edge. We present an \(O(n^2)\) time approximation algorithm for computing the nni distance on weighted phylogenies with a performance ratio of 4 log \(n+4\), where \(n\) is the number of leaves in the phylogenies. We also observe that the nni distance is in fact identical to the linear-cost sub tree-transfer distance on unweighted phylogenies discussed by B. DasGupta et. al. [Proc. 8 th ACM-SIAM Symp. Discrete Algorithms, 427–436 (1997); Algorithmica 25, No. 2-3, 176–195 (1999; Zbl 0952.68113)]. Some consequences of this observation are also discussed.

For the entire collection see [Zbl 0956.00051].

This paper studies the complexity and efficient approximation algorithms for computing the nni distance and a natural extension of this distance on weighted phylogenies. The following results answer many open questions about the nni distance posed in the literature. 1. Computing the nni distance between two labeled trees is NP-complete. This solves a 25 year old open question appearing again and again. 2. Computing the nni distance between two unlabeled trees is also NP-complete. This answers an open question of K. Culik and D. Wood [Inf. Process. Lett. 15, 39–42 (1982; Zbl 0489.68058)] for which an erroneous proof appeared by M. Krivanek [J. Classif. 3, 55–60 (1986; Zbl 0594.68058)].

3. Biological applications motivate us to extend the nni distance to weighted phylogenies, where edge weights indicate the time-span of evolution along each edge. We present an \(O(n^2)\) time approximation algorithm for computing the nni distance on weighted phylogenies with a performance ratio of 4 log \(n+4\), where \(n\) is the number of leaves in the phylogenies. We also observe that the nni distance is in fact identical to the linear-cost sub tree-transfer distance on unweighted phylogenies discussed by B. DasGupta et. al. [Proc. 8 th ACM-SIAM Symp. Discrete Algorithms, 427–436 (1997); Algorithmica 25, No. 2-3, 176–195 (1999; Zbl 0952.68113)]. Some consequences of this observation are also discussed.

For the entire collection see [Zbl 0956.00051].