×

Analysis of similarity/dissimilarity of DNA sequences based on chaos game representation. (English) Zbl 1272.92041

Summary: The Chaos Game is an algorithm that can allow one to produce pictures of fractal structures. Considering that the four bases A, G, C, and T of DNA sequences can be divided into three classes according to their chemical structure, we propose different kinds of CGR-walk sequences. Based on CGR coordinates of random sequences, we introduce some invariants for the DNA primary sequences. As an application, we can make the examination of similarity/dissimilarity among the first exon of \(\beta \)-globin gene of different species. The results indicate that our method is efficient and can get more biological information.

MSC:

92D25 Population dynamics (general)
37N25 Dynamical systems in biology
62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E., Long-range correlations in nucleotide sequences, Nature, 356, 6365, 168-170 (1992) · doi:10.1038/356168a0
[2] Almeida, J. S.; Carriço, J. A.; Maretzek, A.; Noble, P. A.; Fletcher, M., Analysis of genomic sequences by Chaos Game Representation, Bioinformatics, 17, 5, 429-437 (2001) · doi:10.1093/bioinformatics/17.5.429
[3] Jeffrey, H. J., Chaos game representation of gene structure, Nucleic Acids Research, 18, 8, 2163-2170 (1990) · doi:10.1093/nar/18.8.2163
[4] Buldyrev, S. V.; Dokholyan, N. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Stanley, H. E.; Viswanathan, G. M., Analysis of DNA sequences using methods of statistical physics, Physica A, 249, 1-4, 430-438 (1998) · doi:10.1016/S0378-4371(97)00503-7
[5] Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Stanley, H. E.; Visvanathan, G. M., Fractals in Biology and Medicine: from DNA To the Heartbeat (1994), Berlin, Germany: Springer, Berlin, Germany · Zbl 0880.92016
[6] Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E., Generalized Lévy-walk model for DNA nucleotide sequences, Physical Eeview E, 47, 6, 4514-4523 (1993) · doi:10.1103/PhysRevE.47.4514
[7] Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E., Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis, Physical Review E, 51, 5, 5084-5091 (1995) · doi:10.1103/PhysRevE.51.5084
[8] Dodin, G.; Vandergheynst, P.; Levoir, P.; Cordier, C.; Marcourt, L., Fourier and wavelet transform analysis, a tool for visualizing regular patterns in DNA sequences, Journal of Theoretical Biology, 206, 3, 323-326 (2000) · doi:10.1006/jtbi.2000.2127
[9] Tsonis, A. A.; Kumar, P.; Elsneretal, J. B., Navelet analysis of DNA sequences, Physical Review E, 53, 1828-1834 (1996) · doi:10.1103/PhysRevE.53.1828
[10] Luo, L. F.; Tsai, L.; Zhou, Y. M., Informational parameters of nucleic acid and molecular evolution, Journal of Theoretical Biology, 130, 3, 351-361 (1988) · doi:10.1016/S0022-5193(88)80034-1
[11] Luo, L. F.; Tsai, L., Fractal dimension of nucleic acid and its relation to evolutionary level, Chemical Physics Letters, 5, 421-424 (1988)
[12] Arneodo, A.; D’Aubenton-Carafa, Y.; Bacry, E.; Graves, P. V.; Muzy, J. F.; Thermes, C., Wavelet based fractal analysis of DNA sequences, Physica D, 96, 1-4, 291-320 (1996) · doi:10.1016/0167-2789(96)00029-2
[13] Bai, F.-L.; Liu, Y.-Z.; Wang, T.-M., A representation of DNA primary sequences by random walk, Mathematical Biosciences, 209, 1, 282-291 (2007) · Zbl 1120.92018 · doi:10.1016/j.mbs.2006.06.004
[14] Hamori, E.; Ruskin, J., H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences, The Journal of Biological Chemistry, 258, 2, 1318-1327 (1983)
[15] Zhang, R.; Zhang, C. T., Z-curve, an intuitive tool for visualizing and analyzing the DNA sequences, Journal of Biomolecular Structure & Dynamics, 11, 767-782 (1994)
[16] Guo, X. F.; Randic, M.; Basak, S. C., A novel 2-D graphical representation of DNA sequences of low degeneracy, Chemical Physics Letters, 350, 1-2, 106-112 (2001) · doi:10.1016/S0009-2614(01)01246-5
[17] Randic, M., Graphical representations of DNA as 2-D map, Chemical Physics Letters, 386, 468-471 (2004)
[18] Huang, G. H.; Liao, B.; Liu, Y. F.; Liu, Z. B., HCL curve: a novel 2D graphical representation for DNA sequences, Chemical Physics Letters, 462, 129-132 (2008)
[19] Nandy, A.; Nandy, P., On the uniqueness of quantitative DNA difference descriptions in 2D graphical representation models, Chemical Physics Letters, 368, 1-2, 102-107 (2003) · doi:10.1016/S0009-2614(02)01830-4
[20] Randic, M.; Vracko, M.; Lers, N.; Plavsic, D., Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation, Chemical Physics Letters, 371, 202-207 (2003)
[21] Yao, Y.; Wang, T., A class of new 2-D graphical represent ation of DNA sequences and their application, Chemical Physics Letters, 398, 318-323 (2004)
[22] Liao, B.; Ding, K., A 3D graphical representation of DNA sequences and its application, Theoretical Computer Science, 358, 1, 56-64 (2006) · Zbl 1097.68660 · doi:10.1016/j.tcs.2005.12.012
[23] Cao, Z.; Liao, B.; Li, R., A group of 3D graphical representation of DNA sequences based on dual nucleotides, International Journal of Quantum Chemistry, 108, 9, 1485-1490 (2008) · doi:10.1002/qua.21698
[24] Huang, Y.; Wang, T., New graphical representation of a DNA sequence based on the ordered dinucleotides and its application to sequence analysis, International Journal of Quantum Chemistry, 112, 1746-1757 (2012)
[25] Liao, B.; Zhang, Y.; Ding, K.; Wang, T. M., Analysis of similarity/dissimilarity of DNA sequences based on a condensed curve representation, Journal of Molecular Structure, 717, 1-3, 199-203 (2005) · doi:10.1016/j.theochem.2004.12.015
[26] Chi, R.; Ding, K., Novel 4D numerical representation of DNA sequences, Chemical Physics Letters, 407, 1-3, 63-67 (2005) · doi:10.1016/j.cplett.2005.03.056
[27] Liao, B.; Li, R.; Zhu, W.; Xiang, X., On the similarity of DNA primary sequences based on 5-D representation, Journal of Mathematical Chemistry, 42, 1, 47-57 (2007) · Zbl 1119.92028 · doi:10.1007/s10910-006-9091-z
[28] Liao, B.; Wang, T. M., Analysis of similarity/dissimilarity of DNA sequences based on nonoverlapping triplets of nucleotide bases, Journal of Chemical Information and Computer Sciences, 44, 5, 1666-1670 (2004) · doi:10.1021/ci034271f
[29] Gao, J.; Xu, Z. Y., Chaos game representation (CGR)-walk model for DNA sequences, Chinese Physics B, 18, 1, 370-376 (2009) · doi:10.1088/1674-1056/18/1/060
[30] Yu, Z. G.; Anh, V., Time series model based on global structure of complete genome, Chaos, Solitons and Fractals, 12, 10, 1827-1834 (2001) · Zbl 0979.92015 · doi:10.1016/S0960-0779(00)00147-8
[31] Jiang, L. L.; Xu, Z. Y.; Gao, J., Multifractal hurst analysis of DNA sequence, China Journal of Bioinformatics, 7, 4, 264-267 (2009)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.