×

zbMATH — the first resource for mathematics

Shannon information and power law analysis of the chromosome code. (English) Zbl 1253.94035
Summary: We study the information content of the chromosomes of twenty-three species. Several statistics considering different number of bases for alphabet character encoding are derived. Based on the resulting histograms, word delimiters and character relative frequencies are identified. The knowledge of this data allows moving along each chromosome while evaluating the flow of characters and words. The resulting flux of information is captured by means of Shannon entropy. The results are explored in the perspective of power law relationships allowing a quantitative evaluation of the DNA of the species.

MSC:
94A17 Measures of information, entropy
92D20 Protein sequences, DNA sequences
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] R. T. Schuh and A. V. Z. Brower, Biological Systematics: Principles and Applications, Cornell University Press, 2nd edition, 2009.
[2] H. Seitz, Analytics of Protein-DNA Interactions, Advances in Biochemical Engineering Biotechnology, Springer, 2007.
[3] H. Pearson, “What is a gene?” Nature, vol. 441, no. 7092, pp. 398-401, 2006.
[4] UCSC Genome Bioinformatics, http://hgdownload.cse.ucsc.edu/downloads.html.
[5] G. E. Sims, S. R. Jun, G. A. Wu, and S. H. Kim, “Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 8, pp. 2677-2682, 2009.
[6] W. J. Murphy, T. H. Pringle, T. A. Crider, M. S. Springer, and W. Miller, “Using genomic data to unravel the root of the placental mammal phylogeny,” Genome Research, vol. 17, no. 4, pp. 413-421, 2007.
[7] H. Zhao and G. Bourque, “Recovering genome rearrangements in the mammalian phylogeny,” Genome Research, vol. 19, no. 5, pp. 934-942, 2009.
[8] A. B. Prasad, M. W. Allard, and E. D. Green, “Confirming the phylogeny of mammals by use of large comparative sequence data sets,” Molecular Biology and Evolution, vol. 25, no. 9, pp. 1795-1808, 2008.
[9] I. Ebersberger, P. Galgoczy, S. Taudien, S. Taenzer, M. Platzer, and A. Von Haeseler, “Mapping human genetic ancestry,” Molecular Biology and Evolution, vol. 24, no. 10, pp. 2266-2276, 2007.
[10] C. W. Dunn, A. Hejnol, D. Q. Matus et al., “Broad phylogenomic sampling improves resolution of the animal tree of life,” Nature, vol. 452, no. 7188, pp. 745-749, 2008.
[11] J. A. T. Machado, A. C. Costa, and M. D. Quelhas, “Fractional dynamics in DNA,” Communications in Nonlinear Science and Numerical Simulation, vol. 16, no. 8, pp. 2963-2969, 2011. · Zbl 1218.92038
[12] A. M. Costa, J. T. Machado, and M. D. Quelhas, “Histogram-based DNA analysis for the visualization of chromosome, genome and species information,” Bioinformatics, vol. 27, no. 9, pp. 1207-1214, 2011. · Zbl 05891277
[13] J. A. T. Machado, A. C. Costa, and M. D. Quelhas, “Entropy analysis of the DNA code dynamics in human chromosomes,” Computers & Mathematics with Applications, vol. 62, no. 3, pp. 1612-1617, 2011. · Zbl 1228.94032
[14] J. A. T. Machado, A. C. Costa, and M. D. Quelhas, “Analysis and visualization of chromosome information,” Gene, vol. 491, no. 1, pp. 81-87, 2012.
[15] M. Kimura, The Neutral Theory of Molecular Evolution, Cambridge University Press, Cambridge, Mass, USA, 1983.
[16] P. J. Deschavanne, A. Giron, J. Vilain, G. Fagot, and B. Fertit, “Genomic signature: characterization and classification of species assessed by chaos game representation of sequences,” Molecular Biology and Evolution, vol. 16, no. 10, pp. 1391-1399, 1999.
[17] M. Lynch, “The frailty of adaptive hypotheses for the origins of organismal complexity,” Proceedings of the National Academy of Sciences of the United States of America, vol. 104, no. 1, pp. 8597-8604, 2007.
[18] G. Albrecht-Buehler, “Asymptotically increasing compliance of genomes with Chargaff’s second parity rules through inversions and inverted transpositions,” Proceedings of the National Academy of Sciences of the United States of America, vol. 103, no. 47, pp. 17828-17833, 2006.
[19] D. Mitchell and R. Bridge, “A test of Chargaff’s second rule,” Biochemical and Biophysical Research Communications, vol. 340, no. 1, pp. 90-94, 2006.
[20] B. R. Powdel, S. S. Satapathy, A. Kumar et al., “A study in entire chromosomes of violations of the intra-strand parity of complementary nucleotides (Chargaff’s Second Parity Rule),” DNA Research, vol. 16, no. 6, pp. 325-343, 2009.
[21] C. T. Zhang, R. Zhang, and H. Y. Ou, “The Z curve database: a graphic representation of genome sequences,” Bioinformatics, vol. 19, no. 5, pp. 593-599, 2003.
[22] P. Bak, K. Chen, and C. Tang, “A forest-fire model and some thoughts on turbulence,” Physics Letters A, vol. 147, no. 5-6, pp. 297-300, 1990.
[23] N. E. Israeloff, M. Kagalenko, and K. Chan, “Can Zipf distinguish language from noise in noncoding DNA?” Physical Review Letters, vol. 76, pp. 1976-1979, 1995.
[24] R. N. Mantegna and H. E. Stanley, “Scaling behaviour in the dynamics of an economic index,” Nature, vol. 376, no. 6535, pp. 46-49, 1995.
[25] L. A. Adamic and B. A. Huberman, “Zipfs law and the Internet,” Glottometrics, vol. 3, pp. 143-150, 2002.
[26] H. Aoyama, Y. Fujiwara, and W. Souma, “Kinematics and dynamics of pareto-zipf’s law and gibrat’s law,” Physica A, vol. 344, no. 1-2, pp. 117-121, 2004.
[27] C. Andersson, A. Hellervik, and K. Lindgren, “A spatial network explanation for a hierarchy of urban power laws,” Physica A, vol. 345, no. 1-2, pp. 227-244, 2005.
[28] A. L. Barabási, “The origin of bursts and heavy tails in human dynamics,” Nature, vol. 435, no. 7039, pp. 207-211, 2005.
[29] W. Dahui, L. Menghui, and D. Zengru, “True reason for Zipf’s law in language,” Physica A, vol. 358, no. 2-4, pp. 545-550, 2005.
[30] J. M. Sarabia and F. Prieto, “The Pareto-positive stable distribution: a new descriptive model for city size data,” Physica A, vol. 388, no. 19, pp. 4179-4191, 2009.
[31] T. Fenner, M. Levene, and G. Loizou, “Predicting the long tail of book sales: unearthing the power-law exponent,” Physica A, vol. 389, no. 12, pp. 2416-2421, 2010.
[32] J. A. T. Machado, A. C. Costa, and M. D. Quelhas, “Shannon, Rényie and Tsallis entropy analysis of DNA using phase plane,” Nonlinear Analysis: Real World Applications, vol. 12, no. 6, pp. 3135-3144, 2011. · Zbl 1231.92034
[33] J. A. T. Machado and S. Entropy, “Analysis of the Genome Code,” Mathematical Problems in Engineering, vol. 2012, Article ID 132625, 12 pages, 2012.
[34] J. T. Machado, “Accessing complexity from genome information,” Communications in Nonlinear Science and Numerical Simulations, vol. 17, no. 6, pp. 2237-2243, 2012.
[35] R. Hilfer, Applications of Fractional Calculus in Physics, World Scientific, Singapore, 2000. · Zbl 0998.26002
[36] D. Baleanu and S. I. Vacaru, “Fractional curve flows and solitonic hierarchies in gravity and geometric mechanics,” Journal of Mathematical Physics, vol. 52, no. 5, Article ID 053514, 15 pages, 2011. · Zbl 1317.70009
[37] D. Baleanu, K. Diethelm, E. Scalas, and J. J. Trujillo, Fractional Calculus Models and Numerical Methods, vol. 3 of Complexity, Nonlinearity and Chaos, World Scientific Publishing, 2012. · Zbl 1248.26011
[38] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, pp. 379-423, 1948. · Zbl 1154.94303
[39] E. T. Jaynes, “Information Theory and Statistical Mechanics,” vol. 106, pp. 620-630, 1957. · Zbl 0084.43701
[40] A. I. Khinchin, Mathematical foundations of information theory, Dover Publications, New York, NY, USA, 1957. · Zbl 0088.10404
[41] A. Plastino and A. R. Plastino, “Tsallis Entropy and Jaynes’ information theory formalism,” Brazilian Journal of Physics, vol. 29, no. 1, pp. 50-60, 1999.
[42] H. J. Haubold, A. M. Mathai, and R. K. Saxena, “Boltzmann-Gibbs entropy versus Tsallis entropy: recent contributions to resolving the argument of Einstein concerning “neither Herr Boltzmann nor Herr Planck has given a definition of W”? Essay review,” Astrophysics and Space Science, vol. 290, no. 3-4, pp. 241-245, 2004. · Zbl 1115.82300
[43] A. M. Mathai and H. J. Haubold, “Pathway model, superstatistics, Tsallis statistics, and a generalized measure of entropy,” Physica A, vol. 375, no. 1, pp. 110-122, 2007.
[44] T. Carter, An Introduction to Information Theory and Entropy, Complex Systems Summer School, Santa Fe, Mexico, 2007.
[45] P. N. Rathie and S. Da Silva, “Shannon, Lévy, and Tsallis: a note,” Applied Mathematical Sciences, vol. 2, no. 25-28, pp. 1359-1363, 2008. · Zbl 1154.94351
[46] C. Beck, “Generalised information and entropy measures in physics,” Contemporary Physics, vol. 50, no. 4, pp. 495-510, 2009.
[47] I. J. Taneja, “On measures of information and inaccuracy,” Journal of Statistical Physics, vol. 14, no. 3, pp. 263-270, 1976.
[48] B. D. Sharma and I. J. Taneja, “Three generalized-additive measures of entropy,” Elektronische Informationsverarbeitung und Kybernetik, vol. 13, no. 7-8, pp. 419-433, 1977. · Zbl 0372.94021
[49] A. Wehrl, “General properties of entropy,” Reviews of Modern Physics, vol. 50, no. 2, pp. 221-260, 1978. · Zbl 0484.70014
[50] H. D. Chen, C. H. Chang, L. C. Hsieh, and H. C. Lee, “Divergence and Shannon information in genomes,” Physical Review Letters, vol. 94, no. 17, Article ID 178103, 2005.
[51] R. M. Gray, Entropy and Information Theory, Springer, New York, NY, USA, 1990. · Zbl 0722.94001
[52] M. R. Ubriaco, “Entropies based on fractional calculus,” Physics Letters A, vol. 373, no. 30, pp. 2516-2519, 2009. · Zbl 1231.82024
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.