Fractals and hidden symmetries in DNA.

*(English)*Zbl 1189.92015Summary: This paper deals with the digital complex representation of a DNA sequence and the analysis of existing correlations by wavelets. The symbolic DNA sequence is mapped into a nonlinear time series. By studying this time series the existence of fractal shapes and symmetries will be shown. At a first step, the indicator matrix enables us to recognize some typical patterns of nucleotide distributions. The DNA sequence of the influenza virus A (H1N1) is investigated by using the complex representation, together with the corresponding walks on DNA; in particular, it is shown that DNA walks are fractals. Finally, by using wavelet analysis, the existence of symmetries is proven.

##### MSC:

92C40 | Biochemistry, molecular biology |

28A80 | Fractals |

42C40 | Nontrigonometric harmonic analysis involving wavelets and other special systems |

##### References:

[1] | J. P. Fitch and B. Sokhansanj, “Genomic engineering: moving beyond DNA sequence to function,” Proceedings of the IEEE, vol. 88, no. 12, pp. 1949-1971, 2000. |

[2] | H. Gee, “A journey into the genome: what’s there,” Nature, 2001, http://www.nature.com/nsu/010215/010215-3.html. |

[3] | National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/genbank/. |

[4] | Genome Browser, http://genome.ucsc.edu/. |

[5] | European Informatics Institute, http://www.ebi.ac.uk/. |

[6] | Ensembl, http://www.ensembl.org. |

[7] | C. Cattani and J. J. Rushchitsky, Wavelet and Wave Analysis as Applied to Materials with Micro or Nanostructure, vol. 74 of Series on Advances in Mathematics for Applied Sciences, World Scientific, Singapore, 2007. · Zbl 1152.74001 |

[8] | P. D. Cristea, “Large scale features in DNA genomic signals,” Signal Processing, vol. 83, no. 4, pp. 871-888, 2003. · Zbl 1144.62353 · doi:10.1016/S0165-1684(02)00477-2 |

[9] | K. B. Murray, D. Gorse, and J. M. Thornton, “Wavelet transforms for the characterization and detection of repeating motifs,” Journal of Molecular Biology, vol. 316, no. 2, pp. 341-363, 2002. · doi:10.1006/jmbi.2001.5332 |

[10] | C. Cattani, “Complex representation of DNA sequences,” in Proceedings of the 2nd International Conference Bioinformatics Research and Development (BIRD ’08), M. Elloumi, et al., Ed., vol. 13 of Communications in Computer and Information Science, pp. 528-537, Springer, Vienna, Austria, July 2008. |

[11] | C. Cattani, “Wavelet algorithms for DNA analysis,” in Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, M. Elloumi and A. Y. Zomaya, Eds., Wiley Series in Bioinformatics, chapter 35, Wiley-Blackwell, New York, NY, USA, 2010. · Zbl 1189.92015 |

[12] | R. F. Voss, “Evolution of long-range fractal correlations and 1/f noise in DNA base sequences,” Physical Review Letters, vol. 68, no. 25, pp. 3805-3808, 1992. · doi:10.1103/PhysRevLett.68.3805 |

[13] | F. Voss, “Long-range fractal correlations in DNA introns and exons,” Fractals, vol. 2, pp. 1-6, 1992. |

[14] | C. Cattani, “Haar wavelet-based technique for sharp jumps classification,” Mathematical and Computer Modelling, vol. 39, no. 2-3, pp. 255-278, 2004. · Zbl 1046.94504 · doi:10.1016/S0895-7177(04)90010-6 |

[15] | C. Cattani, “Haar wavelets based technique in evolution problems,” Proceedings of the Estonian Academy of Sciences, Physics and Mathematics, vol. 53, no. 1, pp. 45-63, 2004. · Zbl 1049.65103 |

[16] | A. A. Tsonis, P. Kumar, J. B. Elsner, and P. A. Tsonis, “Wavelet analysis of DNA sequences,” Physical Review E, vol. 53, no. 2, pp. 1828-1834, 1996. · Zbl 0900.86003 |

[17] | M. Altaiski, O. Mornev, and R. Polozov, “Wavelet analysis of DNA sequences,” Genetic Analysis, vol. 12, no. 5-6, pp. 165-168, 1996. · doi:10.1016/1050-3862(95)00129-8 |

[18] | A. Arneodo, Y. D’Aubenton-Carafa, E. Bacry, P. V. Graves, J. F. Muzy, and C. Thermes, “Wavelet based fractal analysis of DNA sequences,” Physica D, vol. 96, no. 1-4, pp. 291-320, 1996. |

[19] | M. Zhang, “Exploratory analysis of long genomic DNA sequences using the wavelet transform: examples using polyomavirus genomes,” in Proceedings of the 6th Genome Sequencing and Analysis Conference, pp. 72-85, 1995. |

[20] | C. Cattani, “Harmonic wavelet approximation of random, fractal and high frequency signals,” Telecommunication Systems, vol. 43, no. 3-4, pp. 207-217, 2010. |

[21] | M. Li, “Fractal time series-a tutorial review,” Mathematical Problems in Engineering, vol. 2010, Article ID 157264, 26 pages, 2010. · Zbl 1191.37002 · doi:10.1155/2010/157264 · eudml:224046 |

[22] | M. Li and J.-Y. Li, “On the predictability of long-range dependent series,” Mathematical Problems in Engineering, vol. 2010, Article ID 397454, 9 pages, 2010. · Zbl 1191.62160 · doi:10.1155/2010/397454 · eudml:229805 |

[23] | M. Li and S. C. Lim, “Power spectrum of generalized Cauchy process,” Telecommunication Systems, vol. 43, no. 3-4, pp. 219-222, 2010. · Zbl 05803253 · doi:10.1007/s11235-009-9209-2 |

[24] | A. Arneodo, E. Bacry, P. V. Graves, and J. F. Muzy, “Characterizing long-range correlations in DNA sequences from wavelet analysis,” Physical Review Letters, vol. 74, no. 16, pp. 3293-3296, 1995. · doi:10.1103/PhysRevLett.74.3293 |

[25] | B. Audit, C. Vaillant, A. Arneodo, Y. D’Aubenton-Carafa, and C. Thermes, “Long-range correlations between DNA bending sites: relation to the structure and dynamics of nucleosomes,” Journal of Molecular Biology, vol. 316, no. 4, pp. 903-918, 2002. · doi:10.1006/jmbi.2001.5363 |

[26] | B. Borstnik, D. Pumpernik, and D. Lukman, “Analysis of apparent 1/f\alpha spectrum in DNA sequences,” Europhysics Letters, vol. 23, pp. 389-394, 1993. |

[27] | S. V. Buldyrev, A. L. Goldberger, S. Havlin, et al., “Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis,” Physical Review E, vol. 51, no. 5, pp. 5084-5091, 1995. · doi:10.1103/PhysRevE.51.5084 |

[28] | H. Herzel, E. N. Trifonov, O. Weiss, and I. Große, “Interpreting correlations in biosequences,” Physica A, vol. 249, no. 1-4, pp. 449-459, 1998. |

[29] | W. Li, “The study of correlation structures of DNA sequences: a critical review,” Computers and Chemistry, vol. 21, no. 4, pp. 257-271, 1997. |

[30] | W. Li and K. Kaneko, “Long-range correlations and partial 1/f\alpha spectrum in a noncoding DNA sequence,” Europhysics Letters, vol. 17, pp. 655-660, 1992. |

[31] | C.-K. Peng, S. V. Buldyrev, A. L. Goldberger, et al., “Long-range correlations in nucleotide sequences,” Nature, vol. 356, no. 6365, pp. 168-170, 1992. · doi:10.1038/356168a0 |

[32] | C.-K. Peng, S. V. Buldyrev, S. Havlin, M. Simons, H. E. Stanley, and A. L. Goldberger, “Mosaic organization of DNA nucleotides,” Physical Review E, vol. 49, no. 2, pp. 1685-1689, 1994. · doi:10.1103/PhysRevE.49.1685 |

[33] | O. Weiss and H. Herzel, “Correlations in protein sequences and property codes,” Journal of Theoretical Biology, vol. 190, no. 4, pp. 341-353, 1998. · doi:10.1006/jtbi.1997.0560 |

[34] | Z.-G. Yu, V. V. Anh, and B. Wang, “Correlation property of length sequences based on global structure of the complete genome,” Physical Review E, vol. 63, no. 1, Article ID 011903, 8 pages, 2001. · doi:10.1103/PhysRevE.63.011903 |

[35] | P. P. Vaidyanathan and B.-J. Yoon, “The role of signal-processing concepts in genomics and proteomics,” Journal of the Franklin Institute, vol. 341, no. 1-2, pp. 111-135, 2004. · Zbl 1094.92044 · doi:10.1016/j.jfranklin.2003.12.001 |

[36] | P. Bernaola-Galván, R. Román-Roldán, and J. L. Oliver, “Compositional segmentation and long-range fractal correlations in DNA sequences,” Physical Review E, vol. 53, no. 5, pp. 5181-5189, 1996. |

[37] | W. Li, “The complexity of DNA: the measure of compositional heterogenity in DNA sequence and measures of complexity,” Complexity, vol. 3, pp. 33-37, 1997. |

[38] | S. Karlin and V. Brendel, “Patchiness and correlations in DNA sequences,” Science, vol. 259, no. 5095, pp. 677-680, 1993. |

[39] | D. Anastassiou, “Frequency-domain analysis of biomolecular sequences,” Bioinformatics, vol. 16, no. 12, pp. 1073-1081, 2000. |

[40] | S. S.-T. Yau, J. Wang, A. Niknejad, C. Lu, N. Jin, and Y.-K. Ho, “DNA sequence representation without degeneracy,” Nucleic Acids Research, vol. 31, no. 12, pp. 3078-3080, 2003. · doi:10.1093/nar/gkg432 |

[41] | G. Dodin, P. Vandergheynst, P. Levoir, C. Cordier, and L. Marcourt, “Fourier and wavelet transform analysis, a tool for visualizing regular patterns in DNA sequences,” Journal of Theoretical Biology, vol. 206, no. 3, pp. 323-326, 2000. · doi:10.1006/jtbi.2000.2127 |

[42] | A. Arneodo, Y. D’Aubenton-Carafa, B. Audit, E. Bacry, J. F. Muzy, and C. Thermes, “What can we learn with wavelets about DNA sequences?” Physica A, vol. 249, no. 1-4, pp. 439-448, 1998. |

[43] | E. Coward, “Equivalence of two Fourier methods for biological sequences,” Journal of Mathematical Biology, vol. 36, no. 1, pp. 64-70, 1997. · Zbl 0887.92016 · doi:10.1007/s002850050090 |

[44] | J. A. Berger, S. K. Mitra, M. Carli, and A. Neri, “Visualization and analysis of DNA sequences using DNA walks,” Journal of the Franklin Institute, vol. 341, no. 1-2, pp. 37-53, 2004. · Zbl 1094.92025 · doi:10.1016/j.jfranklin.2003.12.002 |

[45] | I. Daubechies, Ten Lectures on Wavelets, vol. 61 of CBMS-NSF Regional Conference Series in Applied Mathematics, Society for Industrial and Applied Mathematics, Philadelphia, Pa, USA, 1992. · Zbl 0776.42018 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.