zbMATH — the first resource for mathematics

A time warping approach to multiple sequence alignment. (English) Zbl 1371.92047
Summary: We propose an approach for multiple sequence alignment (MSA) derived from the dynamic time warping viewpoint and recent techniques of curve synchronization developed in the context of functional data analysis. Starting from pairwise alignments of all the sequences (viewed as paths in a certain space), we construct a median path that represents the MSA we are looking for. We establish a proof of concept that our method could be an interesting ingredient to include into refined MSA techniques. We present a simple synthetic experiment as well as the study of a benchmark dataset, together with comparisons with 2 widely used MSA softwares.
92C40 Biochemistry, molecular biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI
[1] Arribas-Gil, A. (2010): “Parameter estimation in multiple hidden i.i.d. models from biological multiple alignment,” Stat. Appl. Genet. Mol. Biol., 9, 10. · Zbl 1304.92081
[2] Arribas-Gil, A. and H.-G. Müller (2014): “Pairwise dynamic time warping for event data,” Comput. Stat. Data Anal., 69, 255-268.
[3] Arribas-Gil, A., E. Gassiat, and C. Matias (2006): “Parameter estimation in pair-hidden Markov models,” Scand. J. Stat., 33, 651-671. · Zbl 1164.62370
[4] Arribas-Gil, A., D. Metzler, and J.-L. Plouhinec (2009): “Statistical alignment with a sequence evolution model allowing rate heterogeneity along the sequence,” IEEE/ACM Trans. Comput. Biol. Bioinf., 6, 281-295.
[5] Do, C., M. Mahabhashyam, M. Brudno, and S. Batzoglou (2005): “ProbCons: probabilistic consistency-based multiple sequence alignment,” Genome Res., 15, 330-340.
[6] Durbin, R., S. Eddy, A. Krogh, and G. Mitchison (1998): Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press, Cambridge. · Zbl 0929.92010
[7] Edgar, R. C. (2004): “MUSCLE: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Res., 32, 1792.
[8] Edgar, R. C. and S. Batzoglou (2006): “Multiple sequence alignment,” Curr. Opin. Struct. Biol., 16, 368-373.
[9] Floyd, R. W. and R. L. Rivest (1975): “Algorithm 489: the algorithm SELECT-for finding the ith smallest of n elements [M1],” Commun. ACM, 18, 173.
[10] Gotoh, O. (1996): “Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments,” J. Mol. Biol., 264, 823-838.
[11] Katoh, K. and D. M. Standley (2013): “MAFFT multiple sequence alignment software version 7: improvements in performance and usability,” Mol. Biol. Evol., 30, 772.
[12] Keogh, E. and C. A. Ratanamahatana (2005): “Exact indexing of dynamic time warping,” Knowl. Inf. Syst., 7, 358-386.
[13] Kruskal, J. B. (1983): “An overview of sequence comparison: time warps, string edits, and macromolecules,” SIAM Rev., 25, 201-237. · Zbl 0512.68048
[14] Kumar, S. and A. Filipski (2007): “Multiple sequence alignment: in pursuit of homologous DNA positions,” Genome Res., 17, 127-135.
[15] Lipman, D. J., S. F. Altschul, and J. D. Kececioglu (1989): “A tool for multiple sequence alignment,” Proc. Natl. Acad. Sci. USA PNAS, 86, 4412-4415.
[16] Liu, X. and H.-G. Müller (2004): “Functional convex averaging and synchronization for time-warped random curves,” J. Am. Stat. Assoc., 99, 687-699. · Zbl 1117.62392
[17] Needleman, S. and C. Wunsch (1970): “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., 48, 443-453.
[18] Notredame, C. (2007): “Recent evolutions of multiple sequence alignment algorithms,” PLOS Comput. Biol., 3, 1-4.
[19] Notredame, C., D. G. Higgins, and J. Heringa (2000): “T-coffee: a novel method for fast and accurate multiple sequence alignment,” J. Mol. Biol., 302, 205-217.
[20] Pages, H., P. Aboyoun, R. Gentleman, and S. DebRoy (2016): “Biostrings: string objects representing biological sequences, and matching algorithms,” R package version 2.28.0.
[21] Pais, F. S.-M., P. d. C. Ruy, G. Oliveira, and R. S. Coimbra (2014): “Assessing the efficiency of multiple sequence alignment programs,” Algorithms Mol. Biol., 9, 4.
[22] Pei, J. and N. V. Grishin (2006): “MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information,” Nucleic Acids Res., 34, 4364-4374.
[23] Smith, T. F., M. S. Waterman, and W. M. Fitch (1982): “Comparative biosequence metrics,” J. Mol. Evol., 18, 423-423.
[24] Tang, R. and H.-G. Müller (2008): “Pairwise curve synchronization for functional data,” Biometrika, 95, 875. · Zbl 1437.62625
[25] Thompson, J., D. G. Higgins, and T. J. Gibson (1994): “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Res., 22, 4673-4680.
[26] Thompson, J. D., F. Plewniak, and O. Poch (1999): “A comprehensive comparison of multiple sequence alignment programs,” Nucleic Acids Res., 27, 2682.
[27] Thompson, J. D., P. Koehl, R. Ripp, and O. Poch (2005): “Balibase 3.0: latest developments of the multiple sequence alignment benchmark,” Proteins Struct. Funct. Bioinf., 61, 127-136.
[28] Thorne, J., H. Kishino, and J. Felsenstein (1991): “An evolutionary model for maximum likelihood alignment of DNA sequences.” J. Mol. Evol., 33, 114-124.
[29] Wallace, I. M., G. Blackshields, and D. G. Higgins (2005): “Multiple sequence alignments,” Curr. Opin. Struct. Biol., 15, 261-266.
[30] Yang, Z. (2006): Computational molecular evolution, Oxford series in ecology and evolution, Oxford University Press, Oxford.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.