zbMATH — the first resource for mathematics

Codon information value and codon transition-probability distributions in short-term evolution. (English) Zbl 1400.92397
Summary: To understand the way the Genetic Code and the physical-chemical properties of coded amino acids affect accepted amino acid substitutions in short-term protein evolution, taking into account only overall amino acid conservation, we consider an underlying codon-level model. This model employs codon pair-substitution frequencies from an empirical matrix in the literature, modified for single-base mutations only. Ordering the degenerated codons according to their codon information value [M. Volkenstein, “Mutations and the value of information”, J. Theoret. Biol. 80, No. 2, 155–169 (1979; doi:10.1016/0022-5193(79)90202-9)], we found that three-fold and most of four-fold degenerated codons, which have low codon values, were best fitted to rank-frequency distributions with constant failure rate (exponentials). In contrast, almost all two-fold degenerated codons, which have high codon values, were best fitted to rank-frequency distributions with variable failure rate (inverse power-laws). Six-fold degenerated codons are considered to be doubly assigned. The exceptional behavior of some codons, including non-degenerate codons, is discussed.
Reviewer: Reviewer (Berlin)
92D20 Protein sequences, DNA sequences
Full Text: DOI
[1] Jiménez-Montaño, M. A.; He, M., Irreplaceable amino acids and reduced alphabets in short-term and directed protein evolution, (Mandoiu, I.; Narasimhan, G.; Zhang, Y., Bioinformatics Research and Applications. Bioinformatics Research and Applications, Lecture Notes in Computer Science, vol. 5542 (2009), Springer: Springer Berlin, Heidelberg), 297-309
[2] Shih, A. C.-C.; Hsiao, T.-C.; Ho, M.-S.; Li, W.-H., Simultaneous amino acid substitutions at antigenic sites drive influenza a hemagglutinin evolution, Proc. Natl. Acad. Sci., 104, 6283-6288 (2007)
[3] Clark, L. A.; Ganesan, S.; Papp, S.; van Vlijmen, H. W.T., Trends in antibody sequence changes during the somatic hypermutation process, J. Immunol., 177, 333-340 (2006)
[4] Keefe, A. D.; Szostak, J. W., Functional proteins from a random-sequence library, Nature, 410, 715-718 (2001)
[5] Arnold, F. H., Design by directed evolution, Acc. Chem. Res., 31, 125-131 (1998)
[6] Orencia, M. C.; Yoon, J. S.; Ness, J. E.; Stemmer, W. P.C.; Stevens, R. C., Predicting the emergence of antibiotic resistance by directed evolution and structural analysis, Nat. Struct. Mol. Biol., 8, 238-242 (2001)
[7] Vitkup, D.; Sander, C.; Church, G., The amino-acid mutational spectrum of human genetic disease, Genome Biol., 4, 1-10 (2003)
[8] Li, W. H., Molecular Evolution (1997), Sinauer Associates: Sinauer Associates Sunderland, MA
[9] Lió, M.; Goldman, P. N., Models of molecular evolution and phylogeny, Genome Res., 8, 1233-1244 (1998)
[10] Schneider, A.; Cannarozzi, G.; Gonnet, G., Empirical codon substitution matrix, BMC Bioinformatics, 6, 134 (2005)
[11] Martínez-Mekler, G., Universality of rank-ordering distributions in the arts and sciences, PLoS One, 4 (2009)
[12] Newman, M. E.J., Power laws, pareto distributions and zipf’s law, Contemp. Phys., 46, 323-351 (2005)
[13] Zipf, G. K., Human Behavior and the Principle of Least Effort (1949), Addison-Wesley Press: Addison-Wesley Press Cambridge, Massachusetts, USA
[14] Li, W. T., The study of correlation structures of DNA: a critical review, Comput. Chem., 21, 257-271 (1997)
[15] Schürmann, T.; Grassberger, P., The predictability of letters in written english, Fractals, 4, 1-5 (1996) · Zbl 0867.94011
[16] Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C.-K.; Simons, M.; Stanley, H. E., Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics, Phys. Rev. E, 52, 2939-2950 (1995)
[17] Som, A.; Chattopadhyay, S.; Chakrabarti, J.; Bandyopadhyay, D., Codon distributions in DNA, Phys. Rev. E, 63, Article 051908 pp. (2001)
[18] Kim, J.; Yang, S.; Kwon, Y.; Lee, E., Codon and amino-acid distribution in DNA, Chaos Solitons Fractals, 23, 1795-1807 (2005) · Zbl 1066.92023
[19] Shenkin, P. S.; Erman, B.; Mastrandrea, L. D., Information-theoretical entropy as a measure of sequence variability, Proteins: Struct. Funct. Bioinform., 11, 297-313 (1991)
[20] Volkenstein, M., Mutations and the value of information, J. Theoret. Biol., 80, 155-169 (1979)
[21] Valdar, W. S.J., Scoring residue conservation, Proteins, 48, 227-241 (2002)
[22] Wu, T. T.; Kabat, E. A., An analysis of the sequences of the variable regions of bence jones proteins and myeloma light chains and their implications for antibody complementarity, J. Exp. Med., 132, 211-250 (1970)
[23] Uzzell, T.; Corbin, K. W., Fitting discrete probability distributions to evolutionary events, Science, 172, 1089-1096 (1971)
[24] Holmquist, R.; Goodman, M.; Conroy, T.; Czelusniak, J., The spatial distribution of fixed mutations within genes coding for proteins, J. Mol. Evol., 19, 437-448 (1983)
[25] Vargas-Madrazo, E.; Lara-Ochoa, F.; Jiménez-Montaño, M., A skewed distribution of amino acids at recognition sites of the hypervariable region of immunoglobulins, J. Mol. Evol., 38, 100-104 (1994)
[26] Lara-Ochoa, F.; Vargas-Madrazo, E.; no, M. J.-M.; Almagro, J., Patterns in the complementary determining regions of immunoglobulins (CDRs), Biosystems, 32, 1-9 (1994)
[27] Bastien, O.; Maréchal, E., Evolution of biological sequences implies an extreme value distribution of type i for both global and local pairwise alignment scores, BMC Bioinformatics, 9, 332 (2008)
[28] Goldman, N.; Yang, Z., A codon-based model of nucleotide substitution for protein-coding DNA sequences, Mol. Biol. Evol., 11, 725-736 (1994)
[29] Halpern, A. L.; Bruno, W. J., Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies, Mol. Biol. Evol., 15, 910-917 (1998)
[30] Muse, S. V.; Gaut, B. S., A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome, Mol. Biol. Evol., 11, 715-724 (1994)
[31] Görnerup, O.; Jacobi, M., A model-independent approach to infer hierarchical codon substitution dynamics, BMC Bioinformatics, 11, 201 (2010)
[32] Kosiol, C.; Goldman, N., Markovian and non-Markovian protein sequence evolution: Aggregated Markov process models, J. Mol. Biol., 411, 910-923 (2011)
[33] Jiménez-Montaño, M. A., A Markov information source for the syntactic characterization of amino acid substitutions in protein evolution, Symmetry Cult. Sci., 23, 323-342 (2012) · Zbl 1324.92025
[34] Uhlenbeck, G. E., (Some Fundamental Problems in Statistical Physics. Some Fundamental Problems in Statistical Physics, Lecture Notes (1968), Iowa State University) · Zbl 0127.21004
[35] Nielsen, R.; Yang, Z., Likelihood models for detecting positively selected amino acid sites and applications to the hiv-1 envelope gene, Genetics, 148, 929-936 (1998)
[36] Yang, Z.; Nielsen, R., Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages, Mol. Biol. Evol., 19, 908-917 (2002)
[37] Tang, H.; Wyckoff, G. J.; Lu, J.; Wu, C.-I., A universal evolutionary index for amino acid changes, Mol. Biol. Evol., 21, 1548-1556 (2004)
[38] Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E., Linguistic features of noncoding DNA sequences, Phys. Rev. Lett., 73, 3169-3172 (1994)
[39] Chattopadhyay, S.; Kanner, W.; Chakrabarti, J., Dna nucleotides: A case study of evolution, Eur. Phys. J. B, 26, 393-398 (2002)
[40] Brun, R.; Rademakers, F., Root—an object oriented data analysis framework, (Proceedings AIHENP’96 Workshop, Lausanne, Sep. 1996. Proceedings AIHENP’96 Workshop, Lausanne, Sep. 1996, Nucl. Inst. & Meth. in Phys. Res. A, vol. 389 (1997)), 81-86
[42] Bachinsky, A.; Ratner, V., Biomed. Zs, 18, 53 (1976), (in Russian)
[43] Dayhoff, M. O.; Schwartz, R. M.; Orcutt, B. C., A model of evolutionary change in proteins, (In Atlas of Protein Sequences and Structure. Vol. 5 (1978)), 345-352
[44] Majewski, J.; Ott, J., Amino acid substitutions in the human genome: evolutionary implications of single nucleotide polymorphisms, Gene, 305, 167-173 (2003)
[45] Dufton, M. J., Genetic code synonym quotas and amino acid complexity: Cutting the cost of proteins?, J. Theoret. Biol., 187, 165-173 (1997)
[46] Taylor, F. J.; Coates, D., The code within the codons, Biosystems, 22, 177-187 (1898)
[47] Jiménez-Montaño, M. A., Protein evolution drives the evolution of the genetic code and vice versa, Biosystems, 54, 47-64 (1999)
[48] Stefano, M. M.; Vadim, N. G., Analysis and functional prediction of reactive cysteine residues, J. Biol. Chem., 287, 4419-4425 (2012)
[49] Leis, J. P.; Keller, E. B., Protein chain-initiating methionine trnas in chloroplasts and cytoplasm of wheat leaves, Proc. Natl. Acad. Sci., 67, 1593-1599 (1970)
[50] Dillon, L. S., The Genetic Mechanism and the Origin of Life, 221 (1978), Plenum Press: Plenum Press New York
[51] Di Giulio, M., An extension of the coevolution theory of the origin of the genetic code, Biol. Direct, 3, 1593-1599 (2008)
[52] Piantadosi, S., Zipf’s word-frequency law in natural language: a critical review and future directions, Psychon. Bull. Rev., 21, 1112-1130 (2014)
[53] Dahuia, W.; Menghuib, L.; Zengrub, D., True reason for zipf’s law in language, Physica A, 358, 545-550 (2004)
[54] Miyata, T.; Miyazawa, S.; Yasunaga, T., Two types of amino acid substitutions in protein evolution, J. Mol. Evol., 12, 219-236 (1979)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.