×

Visualization and analysis of DNA sequences using DNA walks. (English) Zbl 1094.92025

Summary: Visual methods illustrate how DNA sequences are read along a single DNA strand from the \(5'\) end to the \(3'\) end and they provide the hopes of gaining an understanding of the underlying genomic language. By handling genomic sequence residues as elements of a discrete-time signal, digital signal processing techniques can be employed for the analysis of genomic information. Using these representations and applying frequency domain transformations, it is shown that structures, or seemingly nonrandom behavior, may be readily identified in nucleotide sequences.
We review the basic method of DNA walks and we show how these representations can be used to extract useful knowledge from the genomic data; namely long-range correlation information, sequence periodicities, and other sequence characteristics. Further information is elucidated through wavelet transform analysis. This work finally relates a measure of sequence complexity to these visual findings and offers conclusions regarding quantifying DNA sequence behavior or structure.

MSC:

92C40 Biochemistry, molecular biology
92C55 Biomedical imaging and signal processing
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, http://www.ncbi.nlm.nih.gov/, NCBI GenBank, http://www.ncbi. nlm.nih.gov/ Genbank/, NCBI Genomes, http : //www.ncbi.nlm.nih.gov/Genomes/.; National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, http://www.ncbi.nlm.nih.gov/, NCBI GenBank, http://www.ncbi. nlm.nih.gov/ Genbank/, NCBI Genomes, http : //www.ncbi.nlm.nih.gov/Genomes/.
[2] Fitch, J. P.; Sokhansanj, B., Genomic engineeringmoving beyond DNA sequence to function, Proc. IEEE, 88, 12, 1949-1971 (2000)
[3] Roy, A.; Raychaudhury, C.; Nandy, A., Novel techniques of graphical representation and analysis of DNA sequencesa review, J. Biosci., 23, 1, 55-71 (1998)
[4] Dodin, G.; Vandergheynst, P.; Levoir, P.; Cordier, C.; Marcourt, L., Fourier and wavelet transform analysisa tool for visualizing regular patterns in DNA sequences, J. Theor. Biol., 206, 3, 323-326 (2000)
[5] Altaiski, M.; Mornev, O.; Polozov, R., Wavelet analysis of DNA sequences, Genet. Anal. Biomol. Eng., 12, 5-6, 165-168 (1996)
[6] Audit, B.; Vaillant, C.; Arnéodo, A.; d’Aubenton Carafa, Y.; Thermes, C., Long-range correlations between DNA bending sitesrelation to the structure and dynamics of nucleosomes, J. Mol. Biol., 316, 4, 903-918 (2002)
[7] Frontali, C.; Pizzi, E., Similarity in oligonucleotide usage in introns and intergenic regions contributes to long-range correlation in the Caenorhabditis elegans genome, Gene, 232, 1, 87-95 (1999)
[8] Alberts, B.; Bray, D.; Johnson, A.; Lewis, J.; Raff, M.; Roberts, K.; Walker, P., Essential Cell BiologyAn Introduction to the Molecular Biology of the Cell (1998), Garland Publishing: Garland Publishing New York, (Chapter 4)
[9] Arnéodo, A.; d’Aubenton Carafa, Y.; Audit, B.; Bacry, E.; Muzy, J. F.; Thermes, C., What can we learn with wavelets about DNA sequences?, Physica A, 249, 439-448 (1998)
[10] Wang, W.; Johnson, D. H., Computing linear transforms of symbolic signals, IEEE Trans. Signal Process., 50, 3, 628-634 (2002)
[11] Cristea, P. D., Large scale features in DNA genomic signals, Signal Process., 83, 4, 871-888 (2003) · Zbl 1144.62353
[12] Anastassiou, D., Genomic signal processing, IEEE Signal Process. Mag., 18, 4, 8-20 (2001)
[13] Rayleigh, L., On the problem of random vibrations and of random fights in one, two, or three dimensions, Philos. Mag., 37, 220, 321-347 (1919)
[14] McCrea, W. H.; Whipple, F. J.W., Random paths in two and three dimensions, Proc. Roy. Soc. Edinburgh, 60, 281-298 (1940) · Zbl 0027.33903
[15] Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E., Long-range correlations in nucleotide sequences, Nature, 356, 168-170 (1992)
[16] Silverman, B. D.; Linsker, R., A measure of DNA periodicity, J. Theor. Biol., 118, 295-300 (1986)
[17] Grigoriev, A., Analyzing genomes with cumulative skew diagrams, Nucleic Acids Res., 26, 10, 2286-2290 (1998)
[18] Alm, R. A.; Ling, L.-S. L.; Moir, D. T.; King, B. L.; Brown, E. D.; Doig, P. C.; Smith, D. R.; Noonan, B.; Guild, B. C.; deJonge, B. L.; Carmel, G.; Tummino, P. J.; Caruso, A.; Uria-Nickelsen, M.; Mills, D. M.; Ives, C.; Gibson, R.; Merberg, D.; Mills, S. D.; Jiang, Q.; Taylor, D. E.; Vovis, G. F.; Trust, T. J., Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori, Nature, 397, 6715, 176-180 (1999)
[19] Bernaola-Galván, P.; Carpena, P.; Román-Roldán, R.; Oliver, J. L., Study of statistical correlations in DNA sequences, Gene, 300, 1-2, 105-115 (2002)
[20] Audit, B.; Thermes, C.; Vaillant, C.; d’Aubenton Carafa, Y.; Muzy, J. F.; Arnéodo, A., Long-range correlations in genomic DNAa signature of the nucleosomal structure, Phys. Rev. Lett., 86, 11, 2471-2474 (2001)
[21] Li, W., The study of correlation structures of DNA sequencesa critical review, Comput. Chem., 21, 4, 257-271 (1997)
[22] Bernaola-Galván, P.; Román-Roldán, R.; Oliver, J. L., Compositional segmentation and long-range fractal correlations in DNA sequences, Phys. Rev. E, 53, 5, 5181-5189 (1996)
[23] Prabhu, V. V.; Claverie, J.-M., Correlation in intronless DNA, Nature (London), 359, 782 (1992)
[24] Luo, L.; Lee, W.; Jia, L.; Ji, F.; Tsai, L., Statistical correlation of nucleotides in a DNA sequence, Phys. Rev. E, 58, 1, 861-871 (1998)
[25] P.D. Cristea, Analysis of chromosome genomic signals, in: Proceedings of Seventh International Symposium on Signal Processing and Its Applications, IEEE, Paris, France, 1-4 July 2003, pp. 49-52.; P.D. Cristea, Analysis of chromosome genomic signals, in: Proceedings of Seventh International Symposium on Signal Processing and Its Applications, IEEE, Paris, France, 1-4 July 2003, pp. 49-52.
[26] J.A. Berger, S.K. Mitra, M. Carli, A. Neri, New approaches to genome sequence analysis based on digital signal processing, in: Workshop on Genomic Signal Processing and Statistics (GENSIPS), IEEE, Raleigh, North Carolina, USA, 11-13 October 2002, pp. 1-4, CP2-08.; J.A. Berger, S.K. Mitra, M. Carli, A. Neri, New approaches to genome sequence analysis based on digital signal processing, in: Workshop on Genomic Signal Processing and Statistics (GENSIPS), IEEE, Raleigh, North Carolina, USA, 11-13 October 2002, pp. 1-4, CP2-08.
[27] Arnéodo, A.; d’Aubenton Carafa, Y.; Audit, B.; Bacry, E.; Muzy, J. F.; Thermes, C., Nucleotide composition effects on the long-range correlations in human genes, The European Phys. J. B, 1, 259-263 (1998)
[28] Arnéodo, A.; Bacry, E.; Graves, P. V.; Muzy, J. F., Characterizing long-range correlations in DNA sequences from wavelet analysis, Phys. Rev. Lett., 74, 16, 3293-3296 (1995)
[29] M. Carli, F. Coppola, G. Jacovitti, A. Neri, Translation, orientation, and scale estimation based on Laguerre-Gauss circular harmonic pyramids, in: E.R. Dougherty, J.T. Astola, K.O. Egiazarian (Eds.), SPIE Conference Photonics West, Vol. 4667, San Jose, CA, USA, 2002, pp. 55-65.; M. Carli, F. Coppola, G. Jacovitti, A. Neri, Translation, orientation, and scale estimation based on Laguerre-Gauss circular harmonic pyramids, in: E.R. Dougherty, J.T. Astola, K.O. Egiazarian (Eds.), SPIE Conference Photonics West, Vol. 4667, San Jose, CA, USA, 2002, pp. 55-65.
[30] Jacovitti, G.; Neri, A., Multiresolution circular harmonic decomposition, IEEE Trans. Signal Process., 48, 11, 3242-3247 (2000)
[31] Coward, E., Equivalence of two Fourier methods for biological sequences, J. Math. Biol., 36, 64-70 (1997) · Zbl 0887.92016
[32] J.A. Berger, S.K. Mitra, J. Astola, Power spectrum analysis for DNA sequences, in: Proceedings of Seventh International Symposium on Signal Processing and Its Applications, IEEE, Paris, France, 1-4 July 2003, pp. 29-32.; J.A. Berger, S.K. Mitra, J. Astola, Power spectrum analysis for DNA sequences, in: Proceedings of Seventh International Symposium on Signal Processing and Its Applications, IEEE, Paris, France, 1-4 July 2003, pp. 29-32.
[33] Staden, R.; McLachlan, A. D., Codon preference and its use in identifying protein coding regions in long DNA sequences, Nucleic Acids Res., 10, 1, 141-156 (1982)
[34] Salamon, P.; Konopka, A. K., A maximum entropy principle for the distribution of local complexity in naturally occurring nucleotide sequences, Comput. Chem., 16, 2, 117-124 (1992) · Zbl 0752.92018
[35] Salamon, P.; Wooten, J. C.; Konopka, A. K.; Hansen, L. K., On the robustness of maximum entropy relationships for complexity distributions of nucleotide sequences, Comput. Chem., 17, 2, 135-148 (1993) · Zbl 0825.92103
[36] Wootton, J. C.; Federhen, S., Statistics of local complexity in amino acid sequences and sequence databases, Comput. Chem., 17, 2, 149-163 (1993) · Zbl 0825.92102
[37] Clay, O.; Carels, N.; Douady, C.; Macaya, G.; Bernardi, G., Compositional heterogeneity within and among isochores in mammalian genomes ICsCl and sequence analyses, Gene, 276, 1-2, 15-24 (2001)
[38] Viswanathan, G. M.; Buldyrev, S. V.; Havlin, S.; Stanley, H. E., Quantification of DNA patchiness using long-range correlation measures, Biophys. J., 72, 2, 866-875 (1997)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.