×

zbMATH — the first resource for mathematics

An extension of fuzzy topological approach for comparison of genetic sequences. (English) Zbl 1361.92055
Summary: Bioinformatics is a relatively new discipline where Mathematics are applied in the analysis of genetic sequences. The analysis of the genetic material of living organisms which consist of nucleic acids DNA and RNA is of great importance for diagnosis and taxonomy reasons. In the present paper we propose a new methodology for the representation of genetic sequences as fuzzy sets in the \(I^{12}\) space which can significantly improve the results of Sadegh-Zadeh and Torres & Nieto. An important characteristic of our proposed methodology is that the location of Amino acids along the genetic sequences play an importantrole thus extending in a significant way the computational efficiency advantage of genetic sequence representation. We present some characteristic examples using the new proposed methodology where we calculate the distance and similarity degree of given polynucleotides.
MSC:
92D20 Protein sequences, DNA sequences
92C40 Biochemistry, molecular biology
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Ajay, Accurate and comprehensive sequencing of personal genomes, Genome Research 21 pp 1498– (2011)
[2] Chen, Predicting protein structural class with pseudo amino acid composition and support vector machine fusion network, Analytical Biochemistry 357 pp 116– (2006)
[3] Chen, Using pseudo amino acid composition and support vector machine to predict protein structural class, Journal of Theoretical Biology 243 pp 444– (2006)
[4] Chen, Prediction of apoptosis proteins ubcellular location using improved hybrid approach and pseudo amino acid composition, Journal of Theoretical Biology 248 pp 377– (2007)
[5] Chen, Prediction of the subcellular location of apoptosis proteins, Journal of Theoretical Biology 245 pp 775– (2007)
[6] Chen, iNuc-PhysChem: A sequence-based predictor for identifying nucleosomes via physicochemical properties, PLoS One 7 pp e47843– (2012)
[7] K. Chou, Prediction of protein cellular attributes using pseudoamino acid composition, ProteinsUUStructure, Function, and Genetics 43 (2001), 246–255. (Erratum: Prediction of protein cellular attributes using pseudo amino acid composition, Proteins UU Structure, Function, and Genetics 44 p. 60).
[8] Chou, Using amphiphilic pseudo amino acid composition to predict enzyme subfamilyclasses, Bioinformatics 21 pp 10– (2005)
[9] Chou, Predicting protein quaternary structure by pseudo amino acid composition, Proteins-Structure, Function, and Genetics 53 pp 282– (2003)
[10] Chou, Predicting enzyme family class in a hybridization space, Protein Science 13 pp 2857– (2004)
[11] Chou, Prediction of protease types in a hybridization space, Biochem Biophys Res Commun 339 pp 1015– (2006)
[12] Chou, Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers, J Proteome Res 5 pp 1888– (2006)
[13] Chou, Euk-mPLoc: A fusion classifier forlarge-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, Journal of Proteome Research 6 pp 1728– (2007)
[14] Chou, Review: Recent progresses in protein subcellular location prediction, Analytical Biochemistry 370 pp 1– (2007)
[15] Chou, Large-scale plant protein subcellular location prediction, Journal of Cellular Biochemistry 100 pp 665– (2007)
[16] Chou, MemType-2L: A webserver for predicting membrane proteins and their types by incorporating evolution information through PseUPSSM, Biochemical and Biophysical Research Communications 360 pp 339– (2007)
[17] Chou, Signal-CF: A subsite-coupled and window-fusing approach for predicting signal peptides, Biochemical and Biophysical Research Communications 357 pp 633– (2007)
[18] Chou, Cell-PLoc: A package of webservers for predicting subcellular localization of proteins in various organisms, Nature Protocols 3 pp 153– (2008)
[19] DasGupta, On the complexity and approximation of syntenic distance, Discrete Applied Mathematics 88 pp 59– (1998) · Zbl 0928.68057
[20] De Luca, A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory, Inform and Control 20 pp 301– (1972) · Zbl 0239.94028
[21] Ding, Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network, Protein and Peptide Letters 14 pp 811– (2007)
[22] Dress, A simple proof of the triangle inequality for the NTV metric, Applied Mathematics Letters 16 pp 809– (2003) · Zbl 1045.51006
[23] Dress, A new scaleinvariant Geometry of L1 space, Applied Mathematics Letters 17 pp 815– (2004) · Zbl 1079.92052
[24] Du, Prediction of protein submitochondria locations by hybridizing pseudo amino acid composition with various physicochemical features of segmented sequence, BMC Bioinformatics 7 pp 5– (2006)
[25] Du, PseAAC-General: Fast building various modes of general form of ChouŠs pseudo-amino acid composition for large-scale protein datasets, International Journal of Molecular Sciences 15 pp 3495– (2014)
[26] Fan, Some new fuzzy entropy formulas, Fuzzy Sets and Systems 128 pp 277– (2002) · Zbl 1018.94003
[27] Fan, Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou’s pseudo amino acid composition, Journal of Theoretical Biology 304 pp 88– (2012) · Zbl 1397.92186
[28] Feng, Boosting classifier for predicting protein domain structural class, Biochem Biophys Res Commun 334 pp 213– (2005)
[29] Feng, iHSPPseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition, Anal Biochem 442 pp 118– (2013)
[30] Foster, Application of distance geometry to 3D visualization of sequence relation-ships, Bionformatics 15 pp 89– (1999)
[31] Gonzaga-Jauregui, Human genome sequencing in health and disease, Annu Rev Med 63 pp 35– (2012)
[32] Gusev, On the complexity measures of genetic sequences, Bioinformatics 15 pp 994– (1999)
[33] Hegalson, The fuzzy cube and causal efficacy: Representation of concomitant mechanisms in stroke, Neural Networks 11 pp 549– (1998)
[34] Jamshidi, Dynamic simulation of the human red blood cell matabolic network, Bioinformatics 17 pp 286– (2001)
[35] Jiang, A general edit distance between RNA structures, Journal of Computational Biology 9 pp 371– (2002)
[36] Kandaswamy, AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties, J Theor Biol 270 pp 56– (2011) · Zbl 06543170
[37] Land, Insights from 20 years of bacterial genome sequencing, Functional and Integrative Genomics 15 pp 141– (2015)
[38] Liben-Nowell, On the structure of syntenic distance, Journal of Computational Biology 8 pp 53– (2001) · Zbl 1006.92023
[39] Li, An information-based sequence distance and its application to whole mitochondrian phylogeny, Bioinformatics 17 pp 149– (2001)
[40] Liabres, A new family of metrics for biopolymer contact structures, Computational Biology and Chemistry 28 pp 21– (2004) · Zbl 1088.92022
[41] Lin, Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components, Journal of Computational Chemistry 28 pp 1463– (2007) · Zbl 05429995
[42] Lin, Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant, Biochemical and Biophysical Research Communications 354 pp 548– (2007)
[43] Liu, Low-frequency Fourier spectrum for predicting membrane protein types, Biochemical and Biophysical Research Communications 336 pp 737– (2005)
[44] Giulia, Sublinear growth of information in DNA sequences, Bulletin of Mathematical Biology 67 pp 737– (2005) · Zbl 1334.92311
[45] Mondal, Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification, Journal of Theoretical Biology 243 pp 252– (2006)
[46] Morgenstern, A simple and space-efficient fragmentchaining algorithm for alignment of DNA and protein sequences, Appl Math Lett 15 pp 11– (2002) · Zbl 1015.92013
[47] Moulton, Metrics on RNA secontary structures, Journal of Computational Biology 7 pp 277– (2000)
[48] Mundra, Using pseudo amino acid composition topredict protein subnuclear localization: Approached with PSSM, Pattern Recognition Letters 28 pp 1610– (2007)
[49] Nieto, A metric space to study differences between polynucleotides, Appl Math Lett 16 pp 1289– (2003) · Zbl 1106.92307
[50] Nieto, Midpoints for fuzzy sets and their application in medicine, Artificial Inteligence in Medicine 17 pp 81– (2003) · Zbl 05390983
[51] Nieto, Fuzzy polynucleotide spaces and metrics, Bull Math Biology 68 pp 703– (2006) · Zbl 1334.92275
[52] Qiu, iRSpot-TNCPseAAC: Identify recombination spots with trinucleotide composition and pseudo amino acid components, International Journal of Molecular Sciences 15 pp 1746– (2014)
[53] Sebastian, Multi-fuzzy sets, Int Math Forum 50 pp 2471– (2010) · Zbl 1219.03058
[54] Sebastian, Multi-fuzzy sets: An extension of fuzzy sets, Fuzzy Inf Eng 1 pp 35– (2011) · Zbl 1255.03048
[55] Sebastian, Multi-fuzzy topology, Int J Appl Math 24 pp 117– (2011) · Zbl 1230.54012
[56] Sebastian, Multi-fuzzy subgroups, Int J Contemp Math Sci 6 pp 365– (2011) · Zbl 1235.20063
[57] Sebastian, Multi-fuzzy extensions of functions, Advance in Adaptive Data Analysis 3 pp 339– (2011) · Zbl 1246.03070
[58] Sadegh-Zadeh, Fundamentals of clinical methodology: 3. Nosology, Artificial Inteligence in Medicine 17 pp 87– (1999) · Zbl 05391076
[59] Sadegh-Zadeh, Fuzzy genomes, Artificial Intelligence in Medicine 18 pp 1– (2000) · Zbl 05391071
[60] Sadovsky Michael, The method to compare nucleotide sequences based on the minimum entropy principle, Bulletin of Mathematical Biology 65 pp 309– (2003) · Zbl 1334.92149
[61] Saha, Fuzzy clustering of physicochemical and biochemical properties of amino Acids, Amino Acids 43 pp 583– (2012)
[62] Shannon, A mathematical theory of communication, The Bell Systems Technical Journal 27 pp 379– (1948) · Zbl 1154.94303
[63] Shen, Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo amino acid composition to predict membrane protein types, Biochemical and Biophysical Research Communications 334 pp 288– (2005)
[64] Shen, Predicting protein subnuclear location with optimized evidence-theoretic K-nearestclassifier and pseudo amino acid composition, Biochemical and Biophysical Research Communications 337 pp 752– (2005)
[65] Shen, Ensemble classifier for protein fold pattern recognition, Bioinformatics 22 pp 1717– (2006)
[66] Shen, Fuzzy KNN for predicting membrane protein types from pseudo amino acid composition, Journal of Theoretical Biology 240 pp 9– (2006)
[67] Shen, Hum-mPLoc: An ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites, Biochemical and Biophysical Research Communications 355 pp 1006– (2007)
[68] Shen, EzyPred: A top-down approach for predicting enzyme functional classes and subclasses, Biochemical and Biophysical Research Communications 364 pp 53– (2007)
[69] Shen, Signal-3L: A 3-layer approach for predicting signal peptide, Biochemical and Biophysical Research Communications 363 pp 297– (2007)
[70] Tang, Evaluation of some DNA cloning strategies, Computers Math Applic 39 pp 43– (2000) · Zbl 0947.92011
[71] Torres, The fuzzy polynucleotide space:Basic properties, Bioinformatics 19 pp 587– (2003)
[72] Wang, Using stacked generalization to predict membrane protein types based on pseudo amino acid composition, Journal of Theoretical Biology 242 pp 941– (2006)
[73] Xiao, Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor, Journal of Computational Chemistry 27 pp 478– (2006) · Zbl 05429849
[74] Xiao, GPCR-2L: Predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions, Mol Biosyst 7 pp 911– (2011)
[75] Xu, iNitro-Tyr: Prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition, PloS one 9 pp e105018– (2014)
[76] Zhao, An overview of the prediction of protein DNA-binding sites, International Journal of Molecular Sciences 16 pp 5194– (2015)
[77] Zheng, Advances in the techniques for the prediction of microRNA targets, International Journal of Molecular Sciences 14 pp 8179– (2013)
[78] Zhou, Using ChouŠs amphiphilic pseudo amino acid composition and support vector machine for prediction of enzyme subfamily classes, Journal of Theoretical Biology 248 pp 546– (2007)
[79] Urban, Whole-genome sequencing in pharmacogenetics, Pharmacogenomics 14 pp 345– (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.