×

zbMATH — the first resource for mathematics

Protein fold recognition by alignment of amino acid residues using kernelized dynamic time warping. (English) Zbl 1412.92248
Summary: In protein fold recognition, a protein is classified into one of its folds. The recognition of a protein fold can be done by employing feature extraction methods to extract relevant information from protein sequences and then by using a classifier to accurately recognize novel protein sequences. In the past, several feature extraction methods have been developed but with limited recognition accuracy only.
Protein sequences of varying lengths share the same fold and therefore they are very similar (in a fold) if aligned properly. To this, we develop an amino acid alignment method to extract important features from protein sequences by computing dissimilarity distances between proteins. This is done by measuring distance between two respective position specific scoring matrices of protein sequences which is used in a support vector machine framework. We demonstrated the effectiveness of the proposed method on several benchmark datasets. The method shows significant improvement in the fold recognition performance which is in the range of 4.3-7.6% compared to several other existing feature extraction methods.
MSC:
92D20 Protein sequences, DNA sequences
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J. H.; Zhang, Z.; Miller, W.; Lipman, D. J., Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic Acids Res., 17, 3389-3402, (1997)
[2] Bishop, C. M., Pattern recognition and machine learning, (2006), Springer Science NY. · Zbl 1107.68072
[3] Bouchaffra, D., Tan, J., 2006. Protein fold recognition using a structural Hidden Markov Model. In: Proceedings of the 18th International Conference on Pattern Recognition, pp. 186-189.
[4] Cao, D. S.; Xu, Q. S.; Liang, Y. Z., Propy: a tool to generate various modes of chou׳s pseaac, Bioinformatics, 29, 960-962, (2013)
[5] Chang, C.-C.; Lin, C.-J., LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol., 2, 3, 27:1-27:27, (2011)
[6] Chen, K., Zhang, X., Yang, M.Q., Yang, J.Y., 2007. Ensemble of probabilistic neural networks for protein fold recognition. In: Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE), pp. 66-70.
[7] Chen, W.; Lin, H.; Feng, P. M.; Ding, C.; Zuo, Y. C., Inuc-physchem: a sequence-based predictor for identifying nucleosomes via physicochemical properties, PLoS One, 7, e47843, (2012)
[8] Chen, W.; Feng, P. M.; Lin, H., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e69, (2013)
[9] Chinnasamy, A.; Sung, W. K.; Mittal, A., Protein structure and fold prediction using tree-augmented naive Bayesian classifier, J. Bioinf. Comput. Biol., 3, 4, 803-819, (2005)
[10] Chmielnicki, W.; Stapor, K., A hybrid discriminative-generative approach to protein fold recognition, Neurocomputing, 75, 194-198, (2012)
[11] Chou, K. C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins, 43, 246-255, (2001), (erratum: 2001, vol. 44, 60)
[12] Chou, K. C.; Cai, Y. D., Predicting protein structural class by functional domain composition, Biochem. Biophys. Res. Commun., 321, 1007-1009, (2004), (Corrigendum: ibid., 2005, vol. 329, 1362)
[13] Chou, K. C.; Shen, H. B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. Biophys. Res. Commun., 360, 339-345, (2007)
[14] Chou, K. C.; Shen, H. B., Protident: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information, Biochem. Biophys. Res. Commun., 376, 321-325, (2008)
[15] Chou, K. C.; Shen, H. B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms (updated version: cell-ploc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms), Nat. Sci., 2, 1090-1103, (2010), (Nature Protocols, 2008, 3, 153-162)
[16] Chou, K. C.; Shen, H. B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 2, 63-92, (2009)
[17] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[18] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[19] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS One, 6, e18258, (2011)
[20] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-hum: using accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Mol. Biosyst., 8, 629-641, (2012)
[21] Dehzangi, A.; Amnuaisuk, S. P., Fold prediction problem: the application of new physical and physicochemical-based features, Protein Pept. Lett., 18, 174-185, (2011)
[22] Dehzangi, A.; Amnuaisuk, S. P.; Dehzangi, O., Enhancing protein fold prediction accuracy by using ensemble of different classifiers, Aust. J. Intell. Inf. Process. Syst., 26, 4, 32-40, (2010)
[23] Dehzangi, A., Amnuaisuk, S.P., Ng, K.H.,Mohandesi, E., 2009. Protein fold prediction problem using ensemble of classifiers. In: Proceedings of the 16th International Conference on Neural Information Processing, Part II, pp. 503-511.
[24] Dehzangi, A.; Karamizadeh, Solving protein fold prediction problem using fusion of heterogeneous classifiers, Inf. Int. Interdiscip. J., 14, 11, 3611-3622, (2011)
[25] Dehzangi, A.; Paliwal, K. K.; Sharma, A.; Dehzangi, O.; Sattar, A., A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem, IEEE/ACM Trans. Comput. Biol. Bioinf., 10, 3, v564-v575, (2013)
[26] Dehzangi, A., Paliwal, K.K., Lyons, J., Sharma, A., Sattar, A., 2013b. Exploring potential discriminatory information embedded in pssm to enhance protein structural class prediction accuracy. In: Proceeding of the Pattern Recognition in Bioinformatics. PRIB 2013, LNBI 7986, pp. 208-219.
[27] Dehzangi, A., Paliwal, K.K., Lyons, J., Sharma, A., Sattar, A., 2013c. Enhancing protein fold prediction accuracy using evolutionary and structural features. In: Proceeding of the Pattern Recognition in Bioinformatics. PRIB 2013, LNBI 7986, pp. 196-207.
[28] Dehzangi, A.; Paliwal, K. K.; Lyons, J.; Sharma, A.; Sattar, A., Proposing a highly accurate protein structural class predictor using segmentation-based features, BMC Genomics, (2014), 15 (Suppl 1), S2.
[29] Deschavanne, P.; Tuffery, P., Enhanced protein fold recognition using a structural alphabet, Proteins: Struct. Funct. Bioinf., 76, 129-137, (2009)
[30] Ding, C.; Dubchak, I., Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 4, 349-358, (2001)
[31] Ding, Y. S.; Zhang, T. L., Using chou׳s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier, Patt. Recog. Lett., 29, 1887-1892, (2008)
[32] Dong, Q.; Zhou, S.; Guan, J., A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation, Bioinformatics, 25, 20, 2655-2662, (2009)
[33] Du, P.; Wang, X.; Xu, C.; Gao, Y., Pseaac-builder: a cross-platform stand-alone program for generating various special chou׳s pseudo-amino acid compositions, Anal. Biochem., 425, 117-119, (2012)
[34] Dubchak, I., Muchnik, I., Kim, S.K., 1997. Protein folding class predictor for SCOP: approach based on global descriptors. In: Proceedings of 5th International Conference on Intelligent Systems for Molecular Biology, pp. 104-107.
[35] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of chou׳s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. Theor. Biol., 263, 203-209, (2010) · Zbl 1406.92455
[36] Feng, P. M.; Chen, W.; Lin, H., Ihsp-pseraaac: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition, Anal. Biochem., 442, 118-125, (2013)
[37] Ghanty, P.; Pal, N. R., Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers, IEEE Trans. Nano Biosci., 8, 100-110, (2009)
[38] Hajisharifi, Z.; Piryaiee, M.; Mohammad Beigi, M.; Behbahani, M.; Mohabatkar, H., Predicting anticancer peptides with chou׳s pseudo amino acid composition and investigating their mutagenicity via ames test, J. Theor. Biol., 341, 34-40, (2014)
[39] Huang, J. T.; Tian, J., Amino acid sequence predicts folding rate for middle-size two-state proteins, Proteins: Struct. Funct. Bioinf., 63, 3, 551-554, (2006)
[40] Kavousi, K.; Moshiri, B.; Sadeghi, M.; Araabi, B. N.; Moosavi-Movahedi, A. A., A protein fold classier formed by fusing different modes of pseudo amino acid composition via PSSM, Comput. Biol. Chem., 35, 1, 1-9, (2011) · Zbl 1403.92209
[41] Kecman, V., Yang, T., 2009. Protein fold recognition with adaptive local hyper plane Algorithm, Computational Intelligence in Bioinformatics and Computational Biology, CIBCB ׳09. IEEE Symposium, pp. 75-78.
[42] Klein, P., Prediction of protein structural class by discriminant analysis, BiochimBiopjysActa, 874, 205-215, (1986)
[43] Krishnaraj, Y.; Reddy, C. K., Boosting methods for protein fold recognition: an empirical comparison, IEEE Int. Conf. Bioinf. Biomed., 393-396, (2008)
[44] Kurgan, L. A.; Homaeian, L., Prediction of structural classes for protein sequences and domains—impact of predictiosn algorithms, sequence representation and homology, and test procedures on accuracy, Pattern Recognit., 39, 2323-2343, (2006) · Zbl 1103.68767
[45] Kurgan, L. A.; Zhang, T.; Zhang, H.; Shen, S.; Ruan, J., Secondary structure-based assignment of the protein structural classes, Amino Acids, 35, 551-564, (2008)
[46] Lin, S. X.; Lapointe, J., Theoretical and experimental biology in one, J. Biomed. Sci. Eng. (JBiSE), 6, 435-442, (2013)
[47] Lin, W. Z.; Fang, J. A.; Xiao, X., Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model, PLoS One, 7, e49040, (2012)
[48] Lin, W. Z.; Fang, J. A.; Xiao, X., Iloc-animal: a multi-label learning classifier for predicting subcellular localization of animal proteins, Mol. BioSyst., 9, 634-644, (2013)
[49] Liu, T.; Geng, X.; Zheng, X.; Li, R.; Wang, J., Accurate prediction of protein structural class using autocovariance transformation of PSI-BLAST profiles, Amino Acids, 42, 2243-2249, (2012)
[50] Min, J. L.; Xiao, X.; Chou, K. C., Iezy-drug: a web server for identifying the interaction between enzymes and drugs in cellular networking, BioMed. Res. Int., 2013, 701317, (2013)
[51] Mohabatkar, H.; Beigi, M. M.; Abdolahi, K.; Mohsenzadeh, S., Prediction of allergenic proteins by means of the concept of chou׳s pseudo amino acid composition and a machine learning approach, Med. Chem., 9, 133-137, (2013)
[52] Mohammad Beigi, M.; Behjati, M.; Mohabatkar, H., Prediction of metalloproteinase family based on the concept of chou׳s pseudo amino acid composition using a machine learning approach, J. Struct. Funct. Genomics, 12, 191-197, (2011)
[53] Mohabatkar, H.; Mohammad Beigi, M.; Esmaeili, A., Prediction of GABA(A) receptor proteins using the concept of chou׳s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 281, 18-23, (2011) · Zbl 1397.92215
[54] Najmanovich, R.; Kuttner, J.; Sobolev, V.; Edelman, M., Side-chain flexibility in proteins upon ligand binding, Proteins: Struct. Funct. Bioinf., 39, 3, 261-268, (2000)
[55] Nanni, L.; Lumini, A., Genetic programming for creating chou׳s pseudo amino acid based features for submitochondria localization, Amino Acids, 34, 653-660, (2008)
[56] Nanni, L.; Lumini, A.; Gupta, D.; Garg, A., Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of chou׳s pseudo amino acid composition and on evolutionary information, IEEE/ACM Trans. Comput. Biol. Bioinf., 9, 467-475, (2012)
[57] Ohlson, T.; Wallner, B.; Elofsson, A., Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods, Proteins: Struct. Funct. Bioinf., 57, 188-197, (2004)
[58] Paliwal, K. K.; Sharma, A.; Lyons, J.; Dehzangi, A., A tri-Gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition, IEEE Trans. Nanobiosci., 13, 1, (2014)
[59] Paliwal, K. K.; Sharma, A., Approximate LDA technique for dimensionality reduction in the small sample size case, J. Pattern Recognit. Res., 6, 2, 298-306, (2011)
[60] Paliwal, K. K.; Sharma, A., Improved pseudoinverse linear discriminant analysis method for dimensionality reduction, Int. J. Pattern Recognit. Artif. Intell., 26, 1, 1250002-1-1250002-9, (2012)
[61] Qiu, W. R.; Xiao, X., Irspot-tncpseaac: identify recombination spots with trinucleotide composition and pseudo amino acid components, Int J. Mol. Sci., 15, 1746-1766, (2014)
[62] Sahu, S. S.; Panda, G., A novel feature representation method based on chou׳s pseudo amino acid composition for protein structural class prediction, Comput. Biol. Chem., 34, 320-327, (2010) · Zbl 1403.92221
[63] Sharma, A.; Paliwal, K. K.; Dehzangi, A.; Lyons, J.; Imoto, S.; Miyano, S., A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition, BMC Bioinf., 14, 233, (2013)
[64] Sharma, A.; Lyons, J.; Dehzangi, A.; Paliwal, K. K., A feature extraction technique using bi-Gram probabilities of position specific scoring matrix for protein fold recognition, J. Theor.l Biol., 320, 7, 41-46, (2013) · Zbl 1406.92471
[65] Sharma, A.; Imoto, S.; Miyano, S.; Sharma, V., Null space based feature selection method for gene expression data, Int. J. Mach. Learn. Cybern., 3, 4, 269-276, (2012)
[66] Sharma, A.; Imoto, S.; Miyano, S., A top-r feature selection algorithm for microarray gene expression data, IEEE/ACM Trans. Comput. Biol. Bioinf., 9, 3, 754-764, (2012)
[67] Sharma, A.; Paliwal, K. K., A two-stage linear discriminant analysis for face-recognition, Pattern Recognit. Lett., 33, 9, 1157-1162, (2012)
[68] Sharma, A.; Imoto, S.; Miyano, S., A filter based feature selection algorithm using null space of covariance matrix for DNA microarray gene expression data, Curr. Bioinf., 7, 3, 289-294, (2012)
[69] Sharma, A.; Imoto, S.; Miyano, S., A between-class overlapping filter-based method for transcriptome data analysis, J. Bioinf. Comput. Biol., 10, 5, 1250010-1-1250010-20, (2012)
[70] Sharma, A.; Paliwal, K. K., A gene selection algorithm using Bayesian classification approach, Am. J. Appl. Sci., 9, 1, 127-131, (2012)
[71] Sharma, A.; Paliwal, K. K., A new perspective to null linear discriminant analysis method and its fast implementation using random matrix multiplication with scatter matrices, Pattern Recognit., 45, 6, 2205-2213, (2012) · Zbl 1234.62100
[72] Sharma, A.; Paliwal, K. K.; Imoto, S.; Miyano, S., Principal component analysis using QR decomposition, Int. J. Mach. Learn. Cybern., 4, 6, 679-683, (2013)
[73] Sharma, A.; Paliwal, K. K., A gradient linear discriminant analysis for small sample sized problem, Neural Process. Lett., 27, 1, 17-24, (2008)
[74] Sharma, A.; Koh, C. H.; Imoto, S.; Miyano, S., Strategy of finding optimal number of features on gene expression data, Electron. Lett., 47, 8, 480-482, (2011)
[75] Sharma, A.; Paliwal, K. K., Regularisation of eigenfeatures by extrapolation of scatter-matrix in face-recognition problem, IEE Electron. Lett., 46, 10, 450-475, (2010)
[76] Sharma, A.; Paliwal, K. K., Fast principal component analysis using fixed-point algorithm, Pattern Recognit. Lett., 28, 10, 1151-1155, (2007)
[77] Sharma, A.; Paliwal, K. K.; Onwubolu, G. C., Class-dependent PCA, LDA and MDC: a combined classifier for patter classification, Pattern Recognit., 39, 7, 1215-1229, (2006) · Zbl 1095.68675
[78] Shamim, M. T.A.; Anwaruddin, M.; Nagarajaram, H. A., Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs, Bioinformatics, 23, 24, 3320-3327, (2007)
[79] Shen, H. B.; Chou, K. C., Ensemble classier for protein fold pattern recognition, Bioinformatics, 22, 1717-1722, (2006)
[80] Shen, H. B.; Chou, K. C., Ezypred: A top-down approach for predicting enzyme functional classes and subclasses, Biochem. Biophys. Res. Commun., 364, 53-59, (2007)
[81] Shen, H. B.; Chou, K. C., Pseaac: a flexible web-server for generating various kinds of protein pseudo amino acid composition, Anal. Biochem., 373, 386-388, (2008)
[82] Taguchi, Y.-H.; Gromiha, M. M., Application of amino acid occurrence for discriminating different folding types of globular proteins, BMC Bioinf., 8, 404, (2007)
[83] Vapnik, V. N., The nature of statistical learning theory, (1995), Springer-Verlag New York. · Zbl 0833.62008
[84] Wang, Z. Z.; Yuan, Z., How good is prediction of protein-structural class by the component-coupled method?, Proteins, 38, 165-175, (2000)
[85] Yang, T.; Kecman, V.; Cao, L.; Zhang, C.; Huang, J. Z., Margin-based ensemble classifier for protein fold recognition, Expert Syst. Appl., 38, 12348-12355, (2011)
[86] Ying, Y.; Huang, K.; Campbell, C., Enhanced protein fold recognition through a novel data integration approach, BMC Bioinf., 10, 1, 267, (2009)
[87] Valavanis, I. K.; Spyrou, G. M.; Nikita, K. S., A comparative study of multi-classification methods for protein fold recognition, Int. J. Comput. Intell. Bioinf. Syst. Biol., 1, 3, 332-346, (2010)
[88] Wu, Z. C.; Xiao, X., Iloc-plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites, Mol. BioSyst., 7, 3287-3297, (2011)
[89] Wu, Z. C.; Xiao, X., Iloc-gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins, Protein Pept. Lett., 19, 4-14, (2012)
[90] Xiao, X.; Min, J. L.; Wang, P., Icdi-psefpt: identify the channel-drug interaction in cellular networking with pseaac and molecular fingerprints, J. Theor. Biol., 337C, 71-79, (2013)
[91] Xu, Y.; Shao, X. J.; Wu, L. Y.; Deng, N. Y., Isno-aapair: incorporating amino acid pairwise coupling into pseaac for predicting cysteine S-nitrosylation sites in proteins, PeerJ, 1, e171, (2013)
[92] Xiao, X.; Wu, Z. C., A multi-label classifier for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple sites, PLoS One, 6, e20592, (2011)
[93] Xiao, X.; Wu, Z. C., Iloc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, J. Theor. Biol., 284, 42-51, (2011) · Zbl 1397.92238
[94] Zhang, S. W.; Zhang, Y. L.; Yang, H. F.; Zhao, C. H.; Pan, Q., Using the concept of chou׳s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies, Amino Acids, 34, 565-572, (2008)
[95] Zhang, H.; Zhang, T.; Gao, J.; Ruan, J.; Shen, S.; Kurgan, L. A., Determination of protein folding kinetic types using sequence and predicted secondary structure and solvent accessibility, Amino Acids, 42, 1, 271-283, (2012), Epub 2010 Nov 17
[96] Zhang, T. L.; Ding, Y. S.; Chou, K. C., Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern, Theor. Biol., 250, 186-193, (2008) · Zbl 1397.92551
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.