×

A set of descriptors for identifying the protein-drug interaction in cellular networking. (English) Zbl 1412.92099

Summary: The study of protein-drug interactions is a significant issue for drug development. Unfortunately, it is both expensive and time-consuming to perform physical experiments to determine whether a drug and a protein are interacting with each other. Some previous attempts to design an automated system to perform this task were based on the knowledge of the 3D structure of a protein, which is not always available in practice. With the availability of protein sequences generated in the post-genomic age, however, a sequence-based solution to deal with this problem is necessary. Following other works in this area, we propose a new machine learning system based on several protein descriptors extracted from several protein representations, such as, variants of the position specific scoring matrix (PSSM) of proteins, the amino-acid sequence, and a matrix representation of a protein. The prediction engine is operated by an ensemble of support vector machines (SVMs), with each SVM trained on a specific descriptor and the results of each SVM combined by sum rule. The overall success rate achieved by our final ensemble is notably higher than previous results obtained on the same datasets using the same testing protocols reported in the literature.
MATLAB code and the datasets used in our experiments are freely available for future comparison at http://www.dei.unipd.it/node/2357.

MSC:

92C40 Biochemistry, molecular biology
92D20 Protein sequences, DNA sequences
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Cao, D. S., Propy: a tool to generate various modes of chou’s pseaac, Bioinformatics, 29, 960-962, (2013)
[2] Chen, W., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e69, (2013), (open access at)
[3] Chou, K. C., Review: low-frequency collective motion in biomacromolecules and its biological functions, Biophys. Chem., 30, 3-48, (1988)
[4] Chou, K. C., Low-frequency resonance and cooperativity of hemoglobin, Trends Biochem. Sci., 14, 212-213, (1989)
[5] Chou, K. C., Prediction of protein cellular attributes using pseudo amino acid composition, PROTEINS: Struct. Funct. Genet. (Erratum: ibid., 2001, Vol.44, 60), 43, 246-255, (2001)
[6] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, PROTEINS: Struct. Fucnt. Genet., 43, 246-255, (2001)
[7] Chou, K. C., Review: structural bioinformatics and its impact to biomedical science, Curr. Med. Chem., 11, 2105-2134, (2004)
[8] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[9] Chou, K. C., Prediction of G-protein-coupled receptor classes, J. Proteome Res., 4, 1413-1418, (2005)
[10] Chou, KC, Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[11] Chou, KC, Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[12] Chou, K. C.; Shen, H. B., Review: recent progresses in protein subcellular location prediction, Anal. Biochem., 370, 1-16, (2007)
[13] Chou, K. C.; Shen, H. B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 2, 63-92, (2009), (openly accessible at)
[14] Chou, K. C., The biological functions of low-frequency phonons, Sci. Sin., 20, 447-457, (1977)
[15] Chou, K. C.; Wei, D. Q.; Zhong, W. Z., Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS, Biochem. Biophys. Res. Commun., 308, 2, 148-151, (2003)
[16] Cristianini, N.; Shawe-Taylor, J., An introduction to support vector machines and other kernel-based learning methods, (2000), Cambridge University Press Cambridge, UK
[17] Demšar, J, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1-30, (2006) · Zbl 1222.68184
[18] Ding, J. L., Introduction to gray system theory, J. Gray Syst., 1-24, (1989)
[19] Du, P., Pseaac-builder: a cross-platform stand-alone program for generating various special chou’s pseudo-amino acid compositions, Anal. Biochem., 425, 117-119, (2012)
[20] Du, P., Pseaac-general: fast building various modes of general form of chou’s pseudo-amino acid composition for large-scale protein datasets, Int. J. Mol. Sci., 15, 3495-3506, (2014)
[21] Eckert, H.; Bajorath, J., Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches, Drug Discovery Today, 12, 225-233, (2007)
[22] Esmaeili, M., Using the concept of chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. Theor. Biol., 263, 203-209, (2010) · Zbl 1406.92455
[23] Fan, Y. N., Inr-drug: predicting the interaction of drugs with nuclear receptors in cellular networking, Int. J. Mol. Sci., 15, 4915-4937, (2014)
[24] Feng, PM; Chen, W; Lin, H; Chou, KC, Ihsp-pseraaac: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition, Anal. Biochem., 442, 118-125, (2013)
[25] Georgiou, D. N., Use of fuzzy clustering technique and matrices to classify amino acids and its impact to chou’s pseudo amino acid composition, J. Theor. Biol., 257, 17-26, (2009) · Zbl 1400.92393
[26] Golub, G. H.; Reinsch, C., Singular value decomposition and least squares solutions, Numer. Math., 14, 5, 403-420, (1970) · Zbl 0181.17602
[27] Gregori-Puigjane, E.; Garriga-Sust, R.; Mestres, J., Indexing molecules with chemical graph identifiers, J. Comput. Chem., 32, 2638-2646, (2011)
[28] Gribskov, M., A.D. McLachlan, D. Eisenberg,1987. Profile analysis: detection of distantly related proteins. In: Gribskov M. et al., (Eds.), Proceedings of the National Academy of Sciences (PNAS). Proc. Nat’l. Academy of Sciences USA, vol. 84, pp. 4355-4358.
[29] Gu, Q., Prediction of G-protein-coupled receptor classes in low homology using chou’s pseudo amino acid composition with approximate entropy and hydrophobicity, Patterns Protein Pept. Lett., 17, 559-567, (2010)
[30] Guo, S. H., Inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics, (2014)
[31] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. Theor. Biol., 271, 10-17, (2011) · Zbl 1405.92217
[32] He, Z. S., Predicting drug-target interaction networks based on functional groups and biological features, PLoS One, 5, e9603, (2010)
[33] Hillisch, A.; Pineda, L. F.; Hilgenfeld, R., Utility of homology models in the drug discovery process, Drug Discovery Today, 9, 659-669, (2004)
[34] Jorgensen, W. L., The many roles of computation in drug discovery, Science, 303, (2004)
[35] Kaczorowski, G., Ion channels as drug targets: the next gpcrs, J. Cell Biol., 131, 399-405, (2008)
[36] Kanehisa, M., From genomics to chemical genomics: new developments in KEGG, Nucleic Acids Res., 34, D354-D357, (2006)
[37] Kawashima, S.; Kanehisa, M., Aaindex: amino acid index database, Nucleic Acids Res., 20, 1, (2000)
[38] Knowles, J.; Gromo, G., A guide to drug discovery: target selection in drug discovery, Nat. Rev. Drug Discovery, 2, 63-69, (2003)
[39] Laurent, S.; Elst, L. V.; Muller, R. N., Comparative study of the physicochemical properties of six clinical low molecular weight gadolinium contrast agents, Contrast Media Mol. Imaging, 1, 128-137, (2006)
[40] Li, F. M.; Li, Q. Z., Predicting protein subcellular location using chou’s pseudo amino acid composition and improved hybrid approach, Protein Pept. Lett., 15, (2008)
[41] Lin, S. X., Theoretical and experimental biology in one, J. Biomed. Sci. Eng. (JBiSE), 6, 435-442, (2013)
[42] Liu, B; Wang, X; Lin, L; Dong, Q; Wang, X, A discriminative method for protein remote homology detection and fold recognition combining top-n-grams and latent semantic analysis, BMC Bioinf., 9, 510, (2008)
[43] Liu, B; Wang, X; Lin, L; Dong, Q; Wang, X, Exploiting three kinds of interface propensities to identify protein binding sites, Comput. Biol. Chem., 33, 303-311, (2009)
[44] Liu, B; Wang, X; Chen, Q; Dong, Q; Lan, X, Using amino acid physicochemical distance transformation for fast protein remote homology detection, PLoS One, 7, e46633, (2012)
[45] Liu, B; Wang, X; Zou, Q; Dong, Q; Chen, Q, Protein remote homology detection by combining chou’s pseudo amino acid composition and profile-based protein representation, Mol. Inf., 32, 775-782, (2013)
[46] Liu, B., Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection, Bioinformatics, 30, 472-479, (2014)
[47] Liu, B; Xu, J; Zou, Q; Xu, R; Wang, X, Using distances between top-n-Gram and residue pairs for protein remote homology detection, BMC Bioinf., 15, S3, (2014)
[48] Min, JL; Xiao, X; Chou, KC, Iezy-drug: a web server for identifying the interaction between enzymes and drugs in cellular networking, Biomed. Res. Int., 2013, 701317, (2013)
[49] Mohabatkar, H., Prediction of cyclin proteins using chou’s pseudo amino acid composition, Protein Pept. Lett., 17, 1207-1214, (2010)
[50] Mohabatkar, H., Prediction of GABA(A) receptor proteins using the concept of chou’s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 281, 18-23, (2011) · Zbl 1397.92215
[51] Murphy, L. R.; Wallqvist, A.; Levy, R. M., Simplified amino acid alphabets for protein fold recognition and implications for folding, Protein Eng., 13, 3, 149-152, (2000)
[52] Nanni, L., Genetic programming for creating chou’s pseudo amino acid based features for submitochondria localization, Amino Acids, 34, 653-660, (2008)
[53] Nanni, L.; Brahnam, S.; Lumini, A., A high performance set of pseaac descriptors extracted from the amino acid sequence for protein classification, J. Theor. Biol., 266, 1, 1-10, (2010) · Zbl 1407.92103
[54] Nanni, L.; Brahnam, S.; Lumini, A., Wavelet images and chou’s pseudo amino acid composition for protein classification, Amino Acids, 43, 2, 657-665, (2012)
[55] Nanni, L., Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of chou’s pseudo amino acid composition and on evolutionary information, IEEE/ACM Trans. Comput. Biol. Bioinf., 9, 467-475, (2012)
[56] O׳Boyle, N. M., Open babel: an open chemical toolbox, J. Cheminf., 3, 33, (2011)
[57] Qiu, W. R., Irspot-tncpseaac: identify recombination spots with trinucleotide composition and pseudo amino acid components, Int. J. Mol. Sci., 15, 1746-1766, (2014)
[58] Rarey, M., A fast flexible docking method using an incremental construction algorithm, J. Mol. Biol., 261, 470-489, (1996)
[59] Ren, B., Application of novel atom-type AI topological indices to QSPR studies of alkanes, Comput. Chem., 26, 4, 357-369, (2002)
[60] Rodriguez, J. J.; Kuncheva, L. I.; Alonso, C. J., Rotation forest: a new classifier ensemble method, IEEE Trans. Pattern Anal. Mach. Intell., 28, 10, 1619-1630, (2006)
[61] Shen, H. B., Pseaac: a flexible web-server for generating various kinds of protein pseudo amino acid composition, Anal. Biochem., 373, 386-388, (2008)
[62] Xiao, X., Igpcr-drug: a web server for predicting interaction between GPCRs and drugs in cellular networking, PLoS One, 8, 8, e72234, (2013)
[63] Xiao, X; Min, JL; Wang, P; Chou, KC, Predict drug-protein interaction in cellular networking, Curr. Top. Med. Chem., 13, 1707-1712, (2013)
[64] Xiao, X., Icdi-psefpt: identify the channel-drug interaction in cellular networking with pseaac and molecular fingerprints, J. Theor. Biol., 337, 71-79, (2013)
[65] Xiaohui, N., Using the concept of chou’s pseudo amino acid composition to predict protein solubility: an approach with entropies in information theory, J. Theor. Biol., 332, 211-217, (2013)
[66] Xu, Y., Isno-pseaac: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PLoS One, 8, e55844, (2013)
[67] Xu, Y., Isno-aapair: incorporating amino acid pairwise coupling into pseaac for predicting cysteine S-nitrosylation sites in proteins, PeerJ, 1, e171, (2013), (open access at)
[68] Yamanishi, Y., Prediction of drug-target interaction networks from the integration of chemical and genomic spaces, Bioinformatics, 24, i232-i240, (2008)
[69] Yang, L., Using auto covariance method for functional discrimination of membrane proteins based on evolution information, Amino Acids, 38, 1497-1503, (2010)
[70] Yu, X., Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation, Amino Acids, (2011), (p)
[71] Zeng, Y. H., Using the augmented chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. Theor. Biol., 259, 366-372, (2009) · Zbl 1402.92193
[72] Zeng, Y. H., Using the augmented chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. Theor. Biol., 259, 2, 366-372, (2009) · Zbl 1402.92193
[73] Zhang, S. W., Using chou’s pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented pseaac approach, Amino Acids, 35, 591-598, (2008)
[74] Zhang, S. W., Using the concept of chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies, Amino Acids, 34, 565-572, (2008)
[75] Zhang, Y; Liu, B; Dong, Q; Jin, VX, An improved profile-level domain linker propensity index for protein domain boundary prediction, Protein Pept. Lett., 18, 7-16, (2011)
[76] Zhou, X. B., Using chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. Theor. Biol., 248, 546-551, (2007)
[77] Zhu, S., A probabilistic model for mining implicit ‘chemical compound-gene’ relations from literature, Bioinformatics, 21, Suppl 2, ii245-ii251, (2005)
[78] Zou, D., Supersecondary structure prediction using chou’s pseudo amino acid composition, J. Comput. Chem., 32, 271-278, (2011)
[79] Zou, D., Supersecondary structure prediction using chou’s pseudo amino acid composition, J. Comput. Chem., 32, 271-278, (2011)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.