Highly accurate prediction of protein self-interactions by incorporating the average block and PSSM information into the general PseAAC. (English) Zbl 1393.92016

Summary: It is a challenging task for fundamental research whether proteins can interact with their partners. Protein self-interaction (SIP) is a special case of PPIs, which plays a key role in the regulation of cellular functions. Due to the limitations of experimental self-interaction identification, it is very important to develop an effective biological tool for predicting SIPs based on protein sequences. In the study, we developed a novel computational method called RVM-AB that combines the relevance vector machine (RVM) model and average blocks (AB) for detecting SIPs from protein sequences. Firstly, average blocks (AB) feature extraction method is employed to represent protein sequences on a position specific scoring matrix (PSSM). Secondly, principal component analysis (PCA) method is used to reduce the dimension of AB vector for reducing the influence of noise. Then, by employing the relevance vector machine (RVM) algorithm, the performance of RVM-AB is assessed and compared with the state-of-the-art support vector machine (SVM) classifier and other exiting methods on yeast and human datasets respectively. Using the fivefold test experiment, RVM-AB model achieved very high accuracies of 93.01% and 97.72% on yeast and human datasets respectively, which are significantly better than the method based on SVM classifier and other previous methods. The experimental results proved that the RVM-AB prediction model is efficient and robust. It can be an automatic decision support tool for detecting SIPs. For facilitating extensive studies for future proteomics research, the RVMAB server is freely available for academic use at


92C40 Biochemistry, molecular biology
Full Text: DOI


[1] Altschul, S. F.; Koonin, E. V., Iterated profile searches with PSI-BLAST-a tool for discovery in protein databases, Trends Biochem. Sci., 23, 11, 444-447, (1998)
[2] Baisamy, L.; Jurisch, N.; Diviani, D., Leucine zipper-mediated homo-oligomerization regulates the rho-GEF activity of AKAP-lbc, J. Biol. Chem., 280, 15, 15405-15412, (2005)
[3] Breuer, K., Innatedb: systems biology of innate immunity and beyond - recent updates and continuing curation, Nucleic Acids Res., 41, Database issue, D1228-D1233, (2012)
[4] Chang, C. C.; Lin, C. J., LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., 2, 3, 389-396, (2011)
[5] Chatraryamontri, A., The biogrid interaction database: 2015 update, Nucleic Acids Res., 43, Database issue, 470-478, (2015)
[6] Chen, W., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, 6, e68, (2013)
[7] Chen, W., Irna-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 3, 4208, (2017)
[8] Cheng, X., Iatc-mhyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, (2017)
[9] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics., 21, 1, 10-19, (2005)
[10] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 1, 236-247, (2011) · Zbl 1405.92212
[11] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 3, 218, (2015)
[12] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 2337-2358, (2017)
[13] Chou, K. C.; Shen, H. B., REVIEW: recent advances in developing web-servers for predicting protein attributes, Natural Sci., 1, 2, 63-92, (2009)
[14] Consortium, U. P., Uniprot: a hub for protein information, Nucleic Acids Res., 43, D204-D212, (2014), (D1)
[15] Dehzangi, A., Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into chou׳s general pseaac, J. Theor. Biol., 364, 284, (2015) · Zbl 1405.92092
[16] Du, Q., Amino acid principal component analysis (AAPCA) and its applications in protein structural class prediction, J. Biomol. Struct. Dyn., 23, 6, 635, (2006)
[17] Du, Xiuquan; J. C.; Zheng, Tingting; Duan, Zheng; Qian, Fulan, A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction, Int. J. Mol. Sci., 15, 7, 12731-12749, (2014)
[18] Feng, P., Irna-psecoll: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, 155-163, (2017), (C)
[19] Feng, P., Irna-psecoll: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, 155-163, (2017)
[20] Georgiou, D. N., A study of entropy/clarity of genetic sequences using metric spaces and fuzzy sets, J. Theor. Biol., 267, 1, 95-105, (2010) · Zbl 1410.92084
[21] Georgiou, D. N.; Karakasidis, T. E.; Megaritis, A. C., A short survey on genetic sequences, Chou’s pseudo amino acid composition and its combination with fuzzy set theory, Matern. Child Health Care China, 7, 1, 41-48, (2013)
[22] Gribskov, M.; Mclachlan, A. D.; Eisenberg, D., Profile analysis: detection of distantly related proteins, Proc. Natl. Acad. Sci. U. S. A., 84, 13, 4355-4358, (1987)
[23] Hattori, T., C/EBP family transcription factors are degraded by the proteasome but stabilized by forming dimer, Oncogene, 22, 9, 1273-1280, (2003)
[24] Jeong, J. C.; Lin, X.; Chen, X. W., On position-specific scoring matrix for protein function prediction, IEEE/ACM Trans. Comput. Biol. Bioinf., 8, 2, 308-315, (2011)
[25] Jia, J., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac, J. Theor. Biol., 377, 47, (2015)
[26] Jia, J., Icar-psecp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general pseaac, Oncotarget, 7, 23, 34558-34570, (2016)
[27] Katsamba, P., Linking molecular affinity and cellular specificity in cadherin-mediated adhesion, Proc. Natl. Acad. Sci. U. S. A., 106, 28, 11594-11599, (2009)
[28] Khan, M., Unb-DPC: identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou’s general pseaac, J. Theor. Biol., 415, 13-19, (2017)
[29] Koike, R.; Kidera, A.; Ota, M., Alteration of oligomeric state and domain architecture is essential for functional transformation between transferase and hydrolase with the same scaffold, Protein Sci., 18, 10, 2060-2066, (2009)
[30] Launay, G., Matrixdb, the extracellular matrix interaction database: updated content, a new navigator and expanded functionalities, Nucleic Acids Res., 43, 321-327, (2015), (D1)
[31] Lin, H., Ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 21, 12961-12972, (2014)
[32] Liu, B., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, Web Server issue, W65-W71, (2015)
[33] Liu, B., Irspot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics., 33, 1, 35, (2017)
[34] Liu, B., Irspot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics., 33, 1, 35-41, (2017)
[35] Liu, B., Pse-analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods, Oncotarget, 8, 8, 13338-13343, (2017)
[36] Liu, B.; Fan, Y.; Chou, K. C., 2L-pirna: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucleic Acids, 7, C, 267, (2017)
[37] Liu, B.; Wu, H.; Chou, K. C., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci., 09, 4, 67-91, (2017)
[38] Liu, L. M.; Xu, Y.; Chou, K. C., Ipgk-pseaac: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general pseaac, Med. Chem, (2017)
[39] Liu, X., SPAR: a random forest-based predictor for self-interacting proteins with fine-grained domain information, Amino. Acids., 1-11, (2016)
[40] Liu, Z., Proteome-wide prediction of self-interacting proteins based on multiple properties, Mol. Cell. Proteomics Mcp, 12, 6, 1689-1700, (2013)
[41] Meher, P. K., Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general pseaac, Sci. Rep., 7, (2017)
[42] Miller, S., The accessible surface area stability of oligomeric proteins, Nature, 328, 6133, 834-836, (1987)
[43] Orchard, S., The mintact project-intact as a common curation platform for 11 molcular interaction databases, Nucleic Acids Res., 42, 358-363, (2014)
[44] Pitre, S., PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs, BMC Bioinf., 7, 10, 763-769, (2006)
[45] Qiu, W. R., Iphos-pseen: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier, Oncotarget, 7, 32, 51270-51283, (2016)
[46] Qiu, W. R., Iptm-mlys: identifying multiple lysine PTM sites and their different types, Bioinformatics., 32, 20, 3116, (2016)
[47] Qiu, W. R., Ihyd-psecp: identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general pseaac, Oncotarget, 7, 28, 44310-44321, (2016)
[48] Qiu, W. R., Iphos-pseevo: identifying human phosphorylated proteins by incorporating evolutionary information into general pseaac via grey system theory, Mol. Inf, (2017)
[49] Qiu, W. R., Irnam5C-psednc: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8, 25, 41178, (2017)
[50] Rahimi, M.; Bakhtiarizadeh, M. R.; Mohammadisangcheshmeh, A., Oogenesis_pred: a sequence-based method for predicting oogenesis proteins by six different modes of Chou’s pseudo amino acid composition, J. Theor. Biol., 414, 128-136, (2017)
[51] Salwinski, L., The database of interacting proteins: 2004 update, Nucleic Acids Res., 32, 22, D449-D451, (2004)
[52] Su, Q., Prediction of the aquatic toxicity of aromatic compounds to tetrahymena pyriformis through support vector regression, Oncotarget, (2017)
[53] Tipping, M. E., Sparse Bayesian learning and the relevance vector machine, J. Mach. Learn. Res., 1, 3, 211-244, (2001) · Zbl 0997.68109
[54] Wei, C., Irna-pseu: identifying RNA pseudouridine sites, Mol. Ther. Nucleic Acids, 5, 7, e332, (2016)
[55] Wei, C., Irna-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 3, 4208, (2017)
[56] Woodcock, J. M., The dimeric versus monomeric status of 14-3-3ζ is controlled by phosphorylation of ser58 at the dimer interface, J. Biol. Chem., 278, 38, 36323-36327, (2003)
[57] Xia, J. F.; Han, K.; Huang, D. S., Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor, Protein Pept. Lett., 17, 1, 137-145, (2010)
[58] Xiang, C., Iatc-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics., (2016)
[59] Xu, Y., Isno-pseaac: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PLoS One, 8, 2, e55844, (2013)
[60] Xu, Y., Ipreny-pseaac: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into pseaac, Med. Chem., 13, 999, (2017), 1-1
[61] Yan, X., Isno-aapair: incorporating amino acid pairwise coupling into pseaac for predicting cysteine S-nitrosylation sites in proteins, Peerj, 1, e171, (2013), (article e171)
[62] Zahiri, J., Ppievo: protein-protein interaction prediction from PSSM based evolutionary information, Genomics, 102, 4, 237-242, (2013)
[63] Zahiri, J., Locfuse: human protein-protein interaction prediction via classifier fusion using protein localization information, Q. Rev. Chem. Soc., 104, 6, 496-503, (2014)
[64] Zhong, W. Z.; Zhou, S. F., Molecular science for drug development and biomedicine, Int. J. Mol. Sci., 15, 11, 20072, (2014)
[65] Zhou, G. P.; Zhong, W. Z., Perspectives in medicinal chemistry, Curr. Top. Med. Chem., 16, 4, 381, (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.