×

zbMATH — the first resource for mathematics

A branch-and-cut algorithm for multiple sequence alignment. (English) Zbl 1085.90060
Summary: We consider a branch-and-cut approach for solving the multiple sequence alignment problem, which is a central problem in computational biology. We propose a general model for this problem in which arbitrary gap costs are allowed. An interesting aspect of our approach is that the three (exponentially large) classes of natural valid inequalities that we consider turn out to be both facet-defining for the convex hull of integer solutions and separable in polynomial time. Both the proofs that these classes of valid inequalities are facet-defining and the description of the separation algorithms are far from trivial. Experimental results on several benchmark instances show that our method outperforms the best tools developed so far, in that it produces alignments that are better from a biological point of view. A noteworthy outcome of the results is the effectiveness of using branch-and-cut with only a carefully-selected subset of the variables as a heuristic.

MSC:
90C57 Polyhedral combinatorics, branch-and-bound, branch-and-cut
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Achterberg, T.: SCIP - a framework to integrate constraint and mixed integer programming. Technical Report 04-19, Zuse Institute Berlin, 2004. http://www.zib.de/bib/pub/pw
[2] Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
[3] Bienstock, D.: Potential function methods for approximately solving linear programming problems, Theory and Practice. Kluwer Academic Publishers, Boston, 2002 · Zbl 1088.90001
[4] Carr, R.D., Lancia, G.: Compact vs exponential-size lp relaxations. Operations Research Letters 30, 57–65 (2002) · Zbl 1027.90059 · doi:10.1016/S0167-6377(01)00106-7
[5] Carr, R.D., Lancia, G.: Compact optimization can outperform separation: A case study in structural proteomics. 4OR 2, 221–233 (2004) · Zbl 1061.65049
[6] Carrillo, H., Lipman, D.J.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48 (5), 1073–1082 (1988) · Zbl 0652.92001 · doi:10.1137/0148063
[7] Dayhoff, M., Schwartz, R., Orcut, B.: A model of evolutionary change in proteins. In: M. Dayhoff (ed.) Atlas of Protein Sequence and Structure, vol 5, National Biomedical Research Foundation, Washington, D.C., 1979, pp 345–352
[8] Delcher, A., Kasif, S., Fleischmann, R., J. Peterson, W. O., Salzberg, S.: Alignment of whole genomes. Nucleic Acids Res 27, 2369–2376 (1999) · doi:10.1093/nar/27.11.2369
[9] Eppstein, D.: Sequence comparison with mixed convex and concave costs. J Algorithms (11), 85–101 (1990) · Zbl 0709.68015
[10] Fischetti, M., Toth, P.: A polyhedral approach to the asymmetric traveling salesman problem. Management Sci 43 (11), 1520–1536 (1997) · Zbl 0902.90159 · doi:10.1287/mnsc.43.11.1520
[11] Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979 · Zbl 0411.68039
[12] Golumbic, M.C.: Algorithmic graph theory and perfect graphs. Academic Press, New York, 1980 · Zbl 0541.05054
[13] Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982) · doi:10.1016/0022-2836(82)90398-9
[14] Gupta, S., Kececioglu, J., Schaeffer, A.: Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 2, 459–472 (1995) · doi:10.1089/cmb.1995.2.459
[15] Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge, 1997 · Zbl 0934.68103
[16] Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Science 89, 10915–10919 (1992) · doi:10.1073/pnas.89.22.10915
[17] Kipp Martin, R.: Using separation algorithms to generate mixed integer model reformulations. Oper. Res. Lett. 10, 119–128 (1991) · Zbl 0747.90071 · doi:10.1016/0167-6377(91)90028-N
[18] Larmore, L., Schieber, B.: Online dynamic programming with applications to the prediction of rna secondary structure. In: Proceedings of the First Symposium on Discrete Algorithms 1990, pp 503–512 · Zbl 0785.90095
[19] LEDA (Library of Efficient Data Types and Algorithms), 2004. http://www.algorithmic-solutions.com
[20] Lenhof, H.-P., Morgenstern, B., Reinert, K.: An exact solution for the segment-to-segment multiple sequence alignment problem. Bioinformatics 15 (3), 203–210 (1999) · doi:10.1093/bioinformatics/15.3.203
[21] Lermen, M., Reinert, K.: The practical use of the algorithm for exact multiple sequence alignment. J. Comput. Biol. 7(5), 655–673 (2000) · doi:10.1089/106652701446134
[22] Notredame, C., Higgins, D.G., Heringa, J.: T-coffee : A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000) · doi:10.1006/jmbi.2000.4042
[23] Reinert, K.: A Polyhedral Approach to Sequence Alignment Problems. PhD thesis, Universität des Saarlandes, 1999
[24] Reinert, K., Lenhof, H.-P., Mutzel, P., Mehlhorn, K., Kececioglu, J.: A branch-and-cut algorithm for multiple sequence alignment. In: Proceedings of the First Annual International Conference on Computational Molecular Biology (RECOMB-97), 1997, pp 241–249
[25] K. Reinert, J. Stoye, and T. Will. An iterative methods for faster sum-of-pairs multiple sequence alignment. BIOINFORMATICS 16(9):808–814, 2000.
[26] SCIL–Symbolic Constraints for Integer Linear programming, 2002. http://www.mpi-sb.mpg.de/SCIL
[27] Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: A benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15 (1), 87–88 (1999) http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE/prog_scores.html · doi:10.1093/bioinformatics/15.1.87
[28] Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348 (1994) · doi:10.1089/cmb.1994.1.337
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.