×

zbMATH — the first resource for mathematics

Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition. (English) Zbl 1400.92393
Summary: In this paper we present a study of classification of the 20 amino acids via a fuzzy clustering technique. In order to calculate distances among the various elements we employ two different distance functions: the Minkowski distance function and the NTV metric. In the clustering procedure we take into account several physical properties of the amino acids. We examine the effect of the number and nature of properties taken into account to the clustering procedure as a function of the degree of similarity and the distance function used. It turns out that one should use the properties that determine in the more important way the behavior of the amino acids and that the use of the appropriate metric can help in defining the separation into groups.
Reviewer: Reviewer (Berlin)

MSC:
92D20 Protein sequences, DNA sequences
62A86 Fuzzy analysis in statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Agüero-Chapin, G.; González-Díaz, H.; Molina, R.; Varona-Santos, J.; Uriarte, E.; González-Díaz, Y., Novel 2D maps and coupling numbers for protein sequences. the first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from psidium guajava L, FEBS letters, 580, 3, 723-730, (2006)
[2] Agüero-Chapín, G.; Gonzalez-Díaz, H.; Riva, G.D.; Rodríguez, E.; Sanchez-Rodríguez, A.; Podda, G.; Vazquez-Padrón, R.I., MMM-QSAR recognition of ribonucleases without alignment: comparison with an HMM model and isolation from schizosaccharomyces pombe, prediction, and experimental assay of a new sequence, Journal of chemical information and modeling, 48, 2, 434-448, (2008)
[3] Bardossy, A.; Duckstein, L., Fuzzy rule-based modeling with applications to geophysical, biological and engineering systems, (1995), CRC Press Boca Raton · Zbl 0857.92001
[4] Bezdek, J.C., Pattern recognition with fuzzy objective function algorithms, (1981), Plenum Press New York · Zbl 0503.68069
[5] Chechetkin, V.R., Block structure and stability of the genetic code, Journal of theoretical biology, 222, 177-188, (2003)
[6] Chen, C.; Zhou, X.; Tian, Y.; Zou, X.; Cai, P., Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network, Analytical biochemistry, 357, 116-121, (2006)
[7] Chen, C.; Tian, Y.X.; Zou, X.Y.; Cai, P.X.; Mo, J.Y., Using pseudo-amino acid composition and support vector machine to predict protein structural class, Journal of theoretical biology, 243, 444-448, (2006)
[8] Chen, Y.L.; Li, Q.Z., Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition, Journal of theoretical biology, 248, 377-381, (2007)
[9] Chen, Y.L.; Li, Q.Z., Prediction of the subcellular location of apoptosis proteins, Journal of theoretical biology, 245, 775-783, (2007)
[10] Chou, K.C., A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space, Proteins—structure, function, and genetics, 21, 319-344, (1995)
[11] Chou, K.C., Prediction of protein subcellular locations by incorporating quasi-sequence-order effect, Biochemical and biophysical research communications, 278, 477-483, (2000)
[12] Chou, K.C., Review: prediction of protein structural classes and subcellular locations, Current protein and peptide science, 1, 171-208, (2000)
[13] Chou, K.C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins—structure, function, and genetics, 43, 246-255, (2001), (Erratum: Prediction of protein cellular attributes using pseudo amino acid composition, Proteins—Structure, Function, and Genetics 44 (2001) 60)
[14] Chou, K.C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[15] Chou, K.C., Prediction of G-protein-coupled receptor classes, Journal of proteome research, 4, 1413-1418, (2005)
[16] Chou, K.C.; Cai, Y.D., Predicting protein quaternary structure by pseudo amino acid composition, Proteins—structure, function, and genetics, 53, 282-289, (2003)
[17] Chou, K.C.; Cai, Y.D., Predicting enzyme family class in a hybridization space, Protein science, 13, 2857-2863, (2004)
[18] Chou, K.C.; Cai, Y.D., Prediction of membrane protein types by incorporating amphipathic effects, Journal of chemical information and modeling, 45, 407-413, (2005)
[19] Chou, K.C.; Cai, Y.D., Predicting protein – protein interactions from sequences in a hybridization space, Journal of proteome research, 5, 316-322, (2006)
[20] Chou, K.C.; Elrod, D.W., Protein subcellular location prediction, Protein engineering, 12, 107-118, (1999)
[21] Chou, K.C.; Elrod, D.W., Prediction of membrane protein types and subcellular locations, Proteins—structure, function, and genetics, 34, 137-153, (1999)
[22] Chou, K.C.; Elrod, D.W., Bioinformatical analysis of G-protein-coupled receptors, Journal of proteome research, 1, 429-433, (2002)
[23] Chou, K.C.; Elrod, D.W., Prediction of enzyme family classes, Journal of proteome research, 2, 183-190, (2003)
[24] Chou, K.C.; Shen, H.B., Euk-mploc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, Journal of proteome research, 6, 1728-1734, (2007)
[25] Chou, K.C.; Shen, H.B., Review: recent progresses in protein subcellular location prediction, Analytical biochemistry, 370, 1-16, (2007)
[26] Chou, K.C.; Shen, H.B., Large-scale plant protein subcellular location prediction, Journal of cellular biochemistry, 100, 665-678, (2007)
[27] Chou, K.C.; Shen, H.B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochemical and biophysical research communications, 360, 339-345, (2007)
[28] Chou, K.C.; Shen, H.B., Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides, Biochemical and biophysical research communications, 357, 633-640, (2007)
[29] Chou, K.C.; Shen, H.B., Cell-ploc: a package of web-servers for predicting subcellular localization of proteins in various organisms, Nature protocols, 3, 153-162, (2008)
[30] Chou, K.C.; Zhang, C.T., Predicting protein folding types by distance functions that make allowances for amino acid interactions, Journal of biological chemistry, 269, 22014-22020, (1994)
[31] Chou, K.C.; Zhang, C.T., Review: prediction of protein structural classes, Critical reviews in biochemistry and molecular biology, 30, 275-349, (1995)
[32] Chou, K.C.; Cai, Y.D.; Zhong, W.Z., Predicting networking couples for metabolic pathways of arabidopsis, EXCLI journal, 5, 55-65, (2006)
[33] Ding, Y.S.; Zhang, T.L.; Chou, K.C., Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network, Protein and peptide letters, 14, 811-815, (2007)
[34] Dress, A.; Lokot, T., A simple proof of the triangle inequality for the NTV metric, Applied mathematics letters, 16, 809-813, (2003) · Zbl 1045.51006
[35] Dress, A.; Lokot, T.; Pustylnikov, L.D., A new scale-invariant geometry of L1 space, Applied mathematics letters, 17, 815-820, (2004) · Zbl 1079.92052
[36] Du, P.; Li, Y., Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence, BMC bioinformatics, 7, 518, (2006)
[37] Engelking, R., 1977. General Topology, Warszawa. · Zbl 0373.54002
[38] Feng, Z.P., An overview on predicting the subcellular location of a protein, In silico biology, 2, 291-303, (2002)
[39] Freeland, S.J.; Hurst, L.D., The genetic code is one in a million, Journal of molecular evolution, 47, 238-248, (1998)
[40] Gao, Y.; Shao, S.H.; Xiao, X.; Ding, Y.S.; Huang, Y.S.; Huang, Z.D.; Chou, K.C., Using pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter, Amino acids, 28, 373-376, (2005)
[41] Georgiou, D.N., Karakasidis, T.E., Nieto, J.J., Torres, A., preprint. A study of genetic sequences using metric spaces and fuzzy sets. · Zbl 1410.92084
[42] González-Díaz, H.; Pérez-Castillo, Y.; Podda, G., Uriarte E. computational chemistry comparison of stable/nonstable protein mutants classification models based on 3D and topological indices, Journal of computational chemistry, 28, 12, 1990-1995, (2007)
[43] González-Díaz, H.; Agüero-Chapin, G.; Varona, J.; Molina, R.; Delogu, G.; Santana, L.; Uriarte, E.; Podda, G., 2D-RNA-coupling numbers: a new computational chemistry approach to link secondary structure topology with biological function, Journal of computational chemistry, 28, 6, 1049-1056, (2007)
[44] González-Díaz, H.; Vilar, S.; Santana, L.; Uriarte, E., Medicinal chemistry and bioinformatics-current trends in drugs discovery with networks topological indices, Current topics in medicinal chemistry, 7, 10, 1015-1029, (2007)
[45] González-Díaz, H.; González-Díaz, Y.; Santana, L.; Ubeira, F.M.; Uriarte, E., Proteomics, networks and connectivity indices, Proteomics, 8, 4, 750-778, (2008)
[46] Guo, Y.Z.; Li, M.; Lu, M.; Wen, Z.; Wang, K.; Li, G.; Wu, J., Classifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transform, Amino acids, 30, 397-402, (2006)
[47] Hashimoto, H., Szpilrajn’s theorem on fuzzy orderings, Fuzzy sets and systems, 10, 101-108, (1983) · Zbl 0512.15014
[48] Homaeian, L.; Kurgan, L.A.; Cios, K.J.; Ruan, J.; Chen, K., Prediction of protein secondary structure content for the twilight zone sequences, Proteins, 69, 3, 486-498, (2007)
[49] Karakasidis, T.E.; Georgiou, D.N., Partitioning elements of the periodic table via fuzzy clustering technique, Soft computing, 8, 231-236, (2004) · Zbl 1063.92057
[50] Kawashima, S.; Kanehisa, M., Aaindex: amino acid index database, Nucleic acids research, 28, 374, (2000)
[51] Kawashima, S.; Ogata, H.; Kanehisa, M., Aaindex: amino acid index database, Nucleic acids research, 27, 368-369, (1999)
[52] Kedarisetti, K.; Kurgan, L.; Dick, S., Classifier ensembles for protein structural class prediction with varying homology, Biochemical and biophysical research communications, 348, 3, 981-988, (2006)
[53] Klir, G.J.; Yuan, B., Fuzzy sets and fuzzy logic (theory and applications), (1995), Prentice-Hall PRT New Jersey · Zbl 0915.03001
[54] Kurgan, L.; Chen, K., Prediction of protein structural class for the twilight zone sequences, Biochemical and biophysical research communications, 357, 2, 453-460, (2007)
[55] Kurgan, L.A.; Stach, W.; Ruan, J., Novel scales based on hydrophobicity indices for secondary protein structure, Journal of theoretical biology, 248, 354-366, (2007)
[56] Lin, H.; Li, Q.Z., Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components, Journal of computational chemistry, 28, 1463-1466, (2007)
[57] Lin, H.; Li, Q.Z., Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant, Biochemical and biophysical research communications, 354, 548-551, (2007)
[58] Lin, Z.; Pan, X., Accurate prediction of protein secondary structural content, Journal of protein chemistry, 20, 217-220, (2001)
[59] Liu, H.; Wang, M.; Chou, K.C., Low-frequency Fourier spectrum for predicting membrane protein types, Biochemical and biophysical research communications, 336, 737-739, (2005)
[60] Mocz, G., Fuzzy cluster analysis of simple physicochemical properties of amino acids for recognizing secondary structure in proteins, Protein science, 4, 1178-1187, (1995)
[61] Mondal, S.; Bhavna, R.; Mohan Babu, R.; Ramakumar, S., Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification, Journal of theoretical biology, 243, 252-260, (2006)
[62] Mundra, P.; Kumar, M.; Kumar, K.K.; Jayaraman, V.K.; Kulkarni, B.D., Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM, Pattern recognition letters, 28, 1610-1615, (2007)
[63] Nakai, K.; Kidera, A.; Kanehisa, M., Cluster-analysis of amino-acid indexes for prediction of protein-structure and function, Protein engineering, 2, 93-100, (1988)
[64] Nieto, J.J.; Torres, A., Midpoints for fuzzy sets and their application in medicine, Artificial intelligence in medicine, 17, 81-101, (2003)
[65] Nieto, J.J.; Torres, A.; Vazquez-Trasande, M.M., A metric space to study differences between polynucleotides, Applied mathematics letters, 16, 1289-1294, (2003) · Zbl 1106.92307
[66] Nieto, J.J.; Torres, A.; Georgiou, D.N.; Karakasidis, T.E., Fuzzy polynucleotide spaces and metrics, Bulletin of mathematical biology, 68, 703-725, (2006) · Zbl 1334.92275
[67] Samaras, P.; Kungolos, A.; Karakasidis, T.; Georgiou, D.; Perakis, K., Statistical evaluation of PCDD/F emission data during solid waste combustion by fuzzy clustering techniques, Journal of environmental science and health, marcel dekker, inc. (part A), 36, 153-161, (2001)
[68] Schneider, G.; Wrede, P., The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site, Biophysical journal, 66, 335-344, (1994)
[69] Shen, H.B.; Chou, K.C., Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo amino acid composition to predict membrane protein types, Biochemical and biophysical research communications, 334, 288-292, (2005)
[70] Shen, H.B.; Chou, K.C., Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition, Biochemical and biophysical research communications, 337, 752-756, (2005)
[71] Shen, H.B.; Chou, K.C., Ensemble classifier for protein fold pattern recognition, Bioinformatics, 22, 1717-1722, (2006)
[72] Shen, H.B.; Chou, K.C., Hum-mploc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites, Biochemical and biophysical research communications, 355, 1006-1011, (2007)
[73] Shen, H.B.; Chou, K.C., Ezypred: a top-down approach for predicting enzyme functional classes and subclasses, Biochemical and biophysical research communications, 364, 53-59, (2007)
[74] Shen, H.B.; Chou, K.C., Signal-3L: a 3-layer approach for predicting signal peptide, Biochemical and biophysical research communications, 363, 297-303, (2007)
[75] Shen, H.B.; Yang, J.; Chou, K.C., Fuzzy KNN for predicting membrane protein types from pseudo amino acid composition, Journal of theoretical biology, 240, 9-13, (2006)
[76] Stephen, Y.L.; Freeland, J., A quantitative investigation of the chemical space surrounding amino acid alphabet formation, Journal of theoretical biology, 250, 349-361, (2008) · Zbl 1397.92210
[77] Terano, T.; Asai, K.; Sugeno, M., Fuzzy systems theory and its applications, (1992), Academic Press, Harcount Brace Jovanovich Publishers San Diego, California
[78] Torres, A.; Nieto, J.J., The fuzzy polynucleotide space: basic properties, Bioinformatics, 19, 587-592, (2003)
[79] Torres, A., Nieto, J.J., 2006. Fuzzy logic in medicine and bioinformatics. Journal of Biomedicine and Biotechnology, article ID 91908.
[80] Wang, M.; Yang, J.; Liu, G.P.; Xu, Z.J.; Chou, K.C., Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition, Protein engineering, design, and selection, 17, 509-516, (2004)
[81] Wang, S.Q.; Yang, J.; Chou, K.C., Using stacked generalization to predict membrane protein types based on pseudo amino acid composition, Journal of theoretical biology, 242, 941-946, (2006)
[82] Wolfenden, R., Experimental measures of amino acid hydrophobicity and the topology of transmembrane and globular proteins, Journal of cell biology, 177, i10, (2007)
[83] Xiao, X.; Shao, S.; Ding, Y.; Huang, Z.; Huang, Y.; Chou, K.C., Using complexity measure factor to predict protein subcellular location, Amino acids, 28, 57-61, (2005)
[84] Xiao, X.; Shao, S.; Ding, Y.; Huang, Z.; Chen, X.; Chou, K.C., Using cellular automata to generate image representation for biological sequences, Amino acids, 28, 29-35, (2005)
[85] Xiao, X.; Shao, S.; Ding, Y.; Huang, Z.; Chen, X.; Chou, K.C., An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation, Journal of theoretical biology, 235, 555-565, (2005)
[86] Xiao, X.; Shao, S.H.; Huang, Z.D.; Chou, K.C., Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor, Journal of computational chemistry, 27, 478-482, (2006)
[87] Xiao, X.; Shao, S.H.; Ding, Y.S.; Huang, Z.D.; Chou, K.C., Using cellular automata images and pseudo amino acid composition to predict protein sub-cellular location, Amino acids, 30, 49-54, (2006)
[88] Zhang, Z.D.; Sun, Z.R.; Zhang, C.T., A new approach to predict the helix/strand content of globular proteins, Journal of theoretical biology, 208, 65-78, (2001)
[89] Zhang, T.L.; Ding, Y.S.; Chou, K.C., Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern, Journal of theoretical biology, 250, 186-193, (2008) · Zbl 1397.92551
[90] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, Journal of theoretical biology, 248, 546-551, (2007)
[91] Zimmermann, H.J., Fuzzy theory and its applications, (1991), Kluwer Academic Publishers New York · Zbl 0719.04002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.