×

Using ideas of Kolmogorov complexity for studying biological texts. (English) Zbl 1261.68078

Summary: Kolmogorov complexity furnishes many useful tools for studying different natural processes that can be expressed using sequences of symbols from a finite alphabet (texts), such as genetic texts, literary and music texts, animal communications, etc. Although Kolmogorov complexity is not algorithmically computable, in a certain sense it can be estimated by means of data compressors. Here we suggest a method of analysis of sequences based on ideas of Kolmogorov complexity and mathematical statistics, and apply this method to biological (ethological) “texts”. A distinction of the suggested method from other approaches to the analysis of sequential data by means of Kolmogorov complexity is that it belongs to the framework of mathematical statistics, more specifically, that of hypothesis testing. This makes it a promising candidate for being included in the toolbox of standard biological methods of analysis of different natural texts, from DNA sequences to animal behavioural patterns (ethological “texts”). Two examples of analysis of ethological texts are considered in this paper. Theses examples show that the proposed method is a useful tool for distinguishing between stereotyped and flexible behaviours, which is important for behavioural and evolutionary studies.

MSC:

68Q30 Algorithmic information theory (Kolmogorov complexity, etc.)
68W20 Randomized algorithms
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Anel, C., Sanderson, M.J.: Missing the forest for the trees: phylogenetic compression and its implications for inferring complex evolutionary histories. Syst. Biol. 54(1), 146–157 (2005) · doi:10.1080/10635150590905984
[2] Billingsley, P.: Ergodic Theory and Information. Wiley, New York (1965) · Zbl 0141.16702
[3] Cilibrasi, R., Vitanyi, P.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005) · Zbl 1297.68097 · doi:10.1109/TIT.2005.844059
[4] Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (2006) · Zbl 1140.94001
[5] Ferragina, P., Giancarlo, R., Greco, V., Manzini, G., Valiente, G.: Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment. BMC Bioinf. 8 (2007)
[6] Fisher, R.A.: Statistical Methods, Experimental Design, and Scientific Inference. Oliver & Boyd, Edinburgh (1956) · Zbl 0070.36903
[7] Gallager, R.G.: Information Theory and Reliable Communication. Wiley, New York (1968) · Zbl 0198.52201
[8] Groothuis, T.: The influence of social experience on the development and fixation of the form of displays in the black-headed gull. Anim. Behav. 43(1), 1–14 (1992) · doi:10.1016/S0003-3472(05)80067-3
[9] Hutter, M.: Universal Artificial Intelligence. Sequential Decisions Based on Algorithmic Probability. Springer, Berlin (2005) · Zbl 1099.68082
[10] Kendall, M.G., Stuart, A.: The Advanced Theory of Statistics, Inference and Relationship, vol. 2. Griffin, London (1961) · Zbl 0416.62001
[11] KGB archiver (v. 1.2). http://www.softpedia.com/get/Compression-tools/KGB-Archiver.shtml
[12] Kieffer, J., Yang, E.: Grammar-based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46, 737–754 (2000) · Zbl 1001.94019 · doi:10.1109/18.841160
[13] Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. Springer, New York (1997)
[14] Li, M., Badger, J., Chen, X., Kwong, S., Kearney, P., Zhang, H.Y.: An information based distance and its application to whole mitochondrial genome phylogeny. Bioinformatics (Oxford) 17, 149–154 (2001) · doi:10.1093/bioinformatics/17.2.149
[15] Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.: The similarity metric. IEEE Trans. Inf. Theory 50(12), 3250–3264 (2004) · Zbl 1316.68052 · doi:10.1109/TIT.2004.838101
[16] Lloyd, E. (ed.): Handbook of Applied Statistics, vol. 2. Wiley-Interscience, New York (1984)
[17] McCowan, B., Doyle, L.R., Hanser, S.F.: Using information theory to assess the diversity, complexity, and development of communicative repertoires. J. Comp. Psychol. 116(2), 166–172 (2002) · doi:10.1037/0735-7036.116.2.166
[18] Oller, D.K., Griebel, U. (eds.): Evolution of Communicative Flexibility: Complexity, Creativity, and Adaptability in Human and Animal Communication. MIT Press, Cambridge (2008)
[19] Panteleeva, S., Danzanov, Zh., Reznikova, Zh.: Estimate of complexity of behavioral patterns in ants: analysis of hunting behavior in myrmica rubra (hymenoptera, formicidae) as an example. Entomol. Rev. 91(2), 221–230 (2011) · doi:10.1134/S0013873811020102
[20] Reznikova, Z.: Animal Intelligence: From Individual to Social Cognition. Cambridge University Press, Cambridge (2007)
[21] Reznikova, Zh., Panteleeva, S.: An ant’s eye view of culture: propagation of new traditions through triggering dormant behavioural patterns. Acta Ethol. 11, 73–80 (2008) · doi:10.1007/s10211-008-0044-3
[22] Rissanen, J.: Universal coding, information, prediction, and estimation. IEEE Trans. Inf. Theory 30(4), 629–636 (1984) · Zbl 0574.62003 · doi:10.1109/TIT.1984.1056936
[23] Ryabko, B.: Prediction of random sequences and universal coding. Probl. Inf. Transm. 24(2), 87–96 (1988) · Zbl 0666.94009
[24] Ryabko, B., Reznikova, Zh.: Using Shannon entropy and Kolmogorov complexity to study the communicative system and cognitive capacities in ants. Complexity 2, 37–42 (1996) · Zbl 05473651 · doi:10.1002/(SICI)1099-0526(199611/12)2:2<37::AID-CPLX8>3.0.CO;2-K
[25] Ryabko, B., Reznikova, Z.: The use of ideas of information theory for studying ”language” and intelligence in ants. Entropy 11, 836–853 (2009) · doi:10.3390/e11040836
[26] Ryabko, D., Schmidhuber, J.: Using data compressors to construct order tests for homogeneity and component independence. Appl. Math. Lett. 22(7), 1029–1032 (2009) · Zbl 1179.68063 · doi:10.1016/j.aml.2008.01.008
[27] Ryabko, B., Astola, J., Gammerman, A.: Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series. Theor. Comput. Sci. 359, 440–448 (2006) · Zbl 1097.68048 · doi:10.1016/j.tcs.2006.06.004
[28] Tinbergen, N.: An objective study of the innate behaviour of animals. Bibl. Biotheor. 1, 39–98 (1942)
[29] Tinbergen, N.: The Study of Instinct. Oxford University Press, London (1951) · Zbl 0045.23203
[30] Vitanyi, P.M.B.: Information distance in multiples. IEEE Trans. Inf. Theory 57(4), 2451–2456 (2011) · Zbl 1366.94209 · doi:10.1109/TIT.2011.2110130
[31] Yaglom, A.M., Yaglom, I.M.: Probability and Information. Theory and Decision Library. Springer, Berlin (1983) · Zbl 0544.94001
[32] Zvonkin, A.K., Levin, L.A.: The complexity of finite objects and concepts of information and randomness through the algorithm theory. Russ. Math. Surv. 25(6), 83–124 (1970) · Zbl 0222.02027 · doi:10.1070/RM1970v025n06ABEH001269
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.