×

Fine-tuning the search for microsatellites. (English) Zbl 1334.68322

Summary: The design and implementation is discussed of \(\text{Fire}\mu\text{Sat}_2\), an algorithm to detect microsatellites (short approximate tandem repeats) in DNA. The algorithm relies on deterministic finite automata. The parameters are designed to support requirements expressed by molecular biologists in data exploration. By setting the parameters of \(\text{Fire}\mu\text{Sat}_2\) as liberally as possible, \(\text{Fire}\mu\text{Sat}_2\) is able to detect more microsatellites than all other software algorithms that we have encountered. Furthermore, \(\text{Fire}\mu\text{Sat}_2\) was found to be faster than all the other algorithms that were investigated for approximate tandem repeat detection. In addition to being fast and accurate, the \(\text{Fire}\mu\text{Sat}_2\) algorithm that is described is robust and easily useable.

MSC:

68W32 Algorithms on strings
68Q45 Formal languages and automata
92D20 Protein sequences, DNA sequences
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Abajian, C., (1994), Sputnik: Source code published online, Online
[2] Anwar, T.; Khan, A. U., Ssrscanner: A program for reporting distribution and exact location of simple sequence repeats, Bioinformation, 1, 89-91, (2006)
[3] A. Archambault, Mining genomic data for tandem repeats, Online: http://qcbs.ca/wiki/bioinformatic_tools_to_detect_microsatellites_loci_from_genomic_data, 2012.
[4] Benson, G., Tandem repeats finder, Nucleic Acids Research, 27, 573-580, (1999)
[5] I. Van den Bergh, Finding microsatellites in whole genomes, Masterʼs thesis, Technische Universiteit Eindhoven, 2006.
[6] Castelo, A. T.; Martins, W.; Gao, G. R., TROLL: tandem repeat occurrence locator, Bioinformatics Applications Note, 18, 634-636, (2002)
[7] Cohen, D. I.A., Introduction to computer theory, (1997), Wiley New York
[8] C. De Ridder, Flexible finite automata-based algorithms for detecting microsatellites in DNA, Masterʼs thesis, Department of Computer Science, University of Pretoria, 2010.
[9] C. De Ridder, D.G. Kourie, B.W. Watson, FireμSat: Meeting the challenge of detecting microsatellites in DNA, South African Institute for Computer Scientists and Information Technologists, in: Proceedings of the Annual Research Conference of SAICSIT, 2006, pp. 247-256.
[10] C. De Ridder, D.G. Kourie, B.W. Watson, FireμSat: An algorithm to detect microsatellites in DNA, in: Proceedings of the Prague Stringology Conference (PSC), 2006, pp. 137-150.
[11] C. De Ridder, P.V. Reyneke, B.W. Watson, O. Reva, D.G. Kourie, Cascading Finite Automata for minisatellite detection, in: The 22nd Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), 2011, pp. 31-36.
[12] Delgrange, O.; Rivals, E., STAR: an algorithm to search for tandem approximate repeats, Bioinformatics, 20, 2812-2820, (2004)
[13] Hopcroft, J. E.; Ullman, J. D., Introduction to automata theory languages and computation, (1979), Addison-Wesley Publishing Company · Zbl 0426.68001
[14] Jorda, J.; Kajava, A., T-REKS: identification of tandem repeats in sequences with a K-means based algorithm, Bioinformatics, 25, 2632-2638, (2009)
[15] Kim, N. S.; Park, N. I.; Kim, S. H.; Kim, S. T.; Han, S. S.; Kang, K. Y., Isolation of TC/AG repeat microsatellite sequences for fingerprinting Rice blast fungus and their possible horizontal transfer to plant species, Molecules and Cells, 10, 127-134, (2000)
[16] Kolpakov, R.; Bana, G.; Kucherov, G., Mreps: efficient and flexible detection of tandem repeats in DNA, Nucleic Acid Research, 31, 13, 3672-3678, (2003)
[17] Krishan, A.; Tang, F., Exhaustive whole-genome tandem repeats search, Bioinformatics, 20, 2702-2710, (2004)
[18] Lim, K. G.; Kwoh, C. K.; Hsu, L. Y.; Wirawan, A., Review of tandem repeat search tools: A systematic approach to evaluating algorithmic performance, Briefings in Bioinformatics, (2012)
[19] Mudunuri, S. B.; Nagarajaram, H. A., Imex: imperfect microsatellite extractor, Bioinformatics, 23, 1181-1187, (2007)
[20] Myers, G.; Sagot, M. F., Identifying satellites and periodic repetitions in biological sequences, Journal of Computational Biology, 5, 539-554, (1998)
[21] E. Rivals, J.P. Delahaye, O. Delgrange, M. Dauchet, A first step toward chromosome analysis by compression algorithms, in: Proceedings of the First International IEEE Symposium on Intelligence in Neural and Biological Systems (INBS), 1995.
[22] Theologis, A., Goodbye to “one by one” genetics, Genome Biology, 2, 1-9, (2001)
[23] M.I. Thurston, D. Field, Msatfinder: Detection and characterisation of microsatellites, Online: http://www.genomics.ceh.ac.uk/msatfinder/, 2005.
[24] Tran, N.; Bharaj, B. S.; Diamandis, E. P.; Smith, M.; Li, B. D.L.; Yu, H., Short tandem repeat polymorphism and cancer risk: influence of laboratory analysis of epidemiologic findings, Cancer Epidemiology Biomarkers and Prevention, 13, 2133-2140, (2004)
[25] Wexler, Y.; Yakhini, Z.; Kashi, Y.; Geiger, D., Finding approximate tandem repeats in genomic sequences, Journal of Computational Biology, 12, 928-942, (2005)
[26] Wirawan, A.; Kwoh, C.; Hsu, L.; Koh, T., INVERTER: integrated variable number tandem repeat finder, Communications in Computer and Information Science, 115, 151-164, (2010)
[27] Zhou, H.; Du, L.; Yan, H., Detection of tandem repeats in DNA sequences based on parametric spectral estimation, IEEE Transactions on Information Technology in Biomedicine, 13, 747-755, (2009)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.