×

DCA based algorithms for multiple sequence alignment (MSA). (English) Zbl 1339.92064

Summary: In the last years many techniques in bioinformatics have been developed for the central and complex problem of optimally aligning biological sequences. In this paper we propose a new optimization approach based on DC (difference of convex functions) programming and DC algorithm (DCA) for the multiple sequence alignment in its equivalent binary linear program, called “maximum weight trace” problem. This problem is beforehand recast as a polyhedral DC program with the help of exact penalty techniques in DC programming. Our customized DCA, requiring solution of a few linear programs, is original because it converges after finitely many iterations to a binary solution while it works in a continuous domain. To scale-up largescale (MSA), a constraint generation technique is introduced in DCA. Preliminary computational experiments on benchmark data show the efficiency of the proposed algorithm DCAMSA, which generally outperforms some standard algorithms.

MSC:

92D20 Protein sequences, DNA sequences
90C90 Applications of mathematical programming
90C10 Integer programming
68W32 Algorithms on strings
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the fifteenth international conference on machine learning (ICML 1998), pp 82-90
[2] Chambolle A, DeVore RA, Lee NY, Lucier BJ (1998) Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans Image Process 7: 319-335 · Zbl 0993.94507 · doi:10.1109/83.661182
[3] Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1-38 · Zbl 0364.62022
[4] Greenberg HJ (2007) Integer quadratic programming models in computational biology. In: Operations research proceedings, vol 2006. Springer, Berlin, pp 83-95 · Zbl 1209.90271
[5] Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge, MA · Zbl 0934.68103 · doi:10.1017/CBO9780511574931
[6] Kececioglu J (1993) The maximum weight trace problem in multiple sequence alignment. In: Proceedings of the 4th symposium on combinatorial pattern matching, pp 106-119 · Zbl 1022.68112
[7] Kececioglu JD (1991) Exact and approximation algorithms for DNA sequence reconstruction, PhD thesis, University of Arizona · Zbl 0757.62053
[8] Kececioglu JD, Lenhof H-P, Mehlhorn K, Mutzel P, Reinert K, Vingron M (2000) A polyhedral approach to sequence alignment problems. Discret Appl Math 104:143-186 · Zbl 0998.92017 · doi:10.1016/S0166-218X(00)00194-3
[9] Le Thi HA. DC programming and DCA. Available on http://lita.sciences.univ-metz.fr/lethi/DCA.html · Zbl 0998.92017
[10] Le Thi HA, Pham Dinh T (1997) Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J Glob Optim 11(3):253-285 · Zbl 0905.90131
[11] Le Thi HA, Pham Dinh T (2003) Large scale molecular optimization from distances matrices by a DC optimization approach. SIAM J Optim 14(1):77-116 · Zbl 1075.90071
[12] Le Thi HA, Pham Dinh T (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann Oper Res 133:23-46 · Zbl 1116.90122 · doi:10.1007/s10479-004-5022-1
[13] Le Thi HA, Pham Dinh T, Le DM (1999) Exact penalty in DC programming. Vietnam J Math 27(2):169-178 · Zbl 1006.90062
[14] Le Thi HA, Belghiti M, Tao PD (2007) A new efficient algorithm based on DC programming and DCA for clustering. J Glob Optim 37:593-608 · Zbl 1198.90327
[15] Le Thi HA, Pham Dinh T, Huynh VN (2009) Convergence analysis of DC algorithms for DC programming with subanalytic data. Research report, National Institute for Applied Sciences, Rouen · Zbl 1028.92009
[16] Lenhof H-P, Retnert K, Vingron M (1998) A polyhedral approach to RNA sequence structure alignment. J Comput Biol 5(3):517-530 · doi:10.1089/cmb.1998.5.517
[17] McClure MA, Vasi TK, Fitch WM (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol Biol Evol 11:571-592
[18] Myers E, Miller W (1988) Optimal alignments in linear space. Comput Appl Biosci 4(1):11-17
[19] Neumann J, SchnÖrr C, Steidl G (2004) SVM-based feature selection by direct objective minimisation. Pattern recognition. In: Proceedings of 26th DAGM symposium, LNCS, Springer, Aug. 2004
[20] Notredame C (2002) Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 3(1):131-144 · doi:10.1517/14622416.3.1.131
[21] Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res 24(8):1515-1524 · doi:10.1093/nar/24.8.1515
[22] Notredame C, Higgins DG, Heringa J (2000) T-COFFEE: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 392:205-217 · doi:10.1006/jmbi.2000.4042
[23] Pham Dinh T, Le Thi HA (1997) Convex analysis approach to DC programming: theory, algorithms and applications. Acta Mathematica Vietnamica 22(1):289-355 · Zbl 0895.90152
[24] Pham Dinh T, Le Thi HA (1998) DC optimization algorithms for solving the trust region subproblem. SIAM J Optim 8:476-505 · Zbl 0913.65054 · doi:10.1137/S1052623494274313
[25] Rajasekaran S, Nick H, Pardalos PM, Sahni S, Shaw G (2001a) Efficient algorithms for local alignment search. J Comb Optim 5(1):117-124 · Zbl 1028.92009 · doi:10.1023/A:1009893719470
[26] Rajasekaran S, Hu Y, Luo J, Nick H, Pardalos PM, Sahni S, Shaw G (2001b) Efficient algorithms for similarity search. J Comb Optim 5(1):125-132 · Zbl 1028.92008 · doi:10.1023/A:1009897903540
[27] Reinert K, Lenhof H, Mutzel P, Mehlhorn K, Kececioglu JD (1997) A branch-and-cut algorithm for multiple sequence alignment. RECOMB, pp 241-250 · Zbl 1028.92008
[28] Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton, NJ · Zbl 0193.18401
[29] Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673-4680 · doi:10.1093/nar/22.22.4673
[30] Thompson J, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignments database for the evaluation of multiple sequence alignment programs. Bioinformatics 15:87-88 · doi:10.1093/bioinformatics/15.1.87
[31] Yufeng L, Shen X, Doss H (2005) Multicategory \[\psi\] ψ-Learning and support vector machine: computational tools. J Comput Graph Stat 14(1): 219-236
[32] Yuille AL, Rangarajan A (2003) The convex concave procedure (CCCP). Neural Comput 15(4):915-936 · Zbl 1022.68112 · doi:10.1162/08997660360581958
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.