Alpha influenza virus infiltration prediction using virus-human protein-protein interaction network. (English) Zbl 1467.92060

Summary: More than ten million deaths make influenza virus one of the deadliest of history. About half a million sever illnesses are annually reported consequent of influenza. Influenza is a parasite which needs the host cellular machinery to replicate its genome. To reach the host, viral proteins need to interact with the host proteins. Therefore, identification of host-virus protein interaction network (HVIN) is one of the crucial steps in treating viral diseases. Being expensive, time-consuming and laborious of HVIN experimental identification, force the researches to use computational methods instead of experimental ones to obtain a better understanding of HVIN. In this study, several features are extracted from physicochemical properties of amino acids, combined with different centralities of human protein-protein interaction network (HPPIN) to predict protein-protein interactions between human proteins and alpha-influenza-virus proteins (HI-PPIs). Ensemble learning methods were used to predict such PPIs. Our model reached 0.93 accuracy, 0.91 sensitivity and 0.95 specificity. Moreover, a database including 694,522 new PPIs was constructed by prediction results of the model. Further analysis showed that HPPIN centralities, gene ontology semantic similarity and conjoint triad of virus proteins are the most important features to predict HI-PPIs.


92C32 Pathology, pathophysiology
92C40 Biochemistry, molecular biology


DAVID; AAindex; IntAct
Full Text: DOI


[1] J. M. Langley, M. E. Faughnan, Prevention of influenza in the general population, Can. Med. Assoc. J., 171 (2004), 1213-1222.
[2] W. W. Thompson, D. K. Shay, E. Weintraub, L. Brammer, C. B. Bridges, et al., Influenza-associated hospitalizations in the United States, J. Am. Med. Assoc., 292 (2004), 1333-1340.
[3] J. K. Taubenberger, D. M. Morens, The pathology of influenza virus infections, Annu. Rev. Pathol. Mech. Dis., 3 (2008), 499-522.
[4] A. Nagy, L. Černíková, V. Křivda, J. Horníčková, Digital genotyping of avian influenza viruses of H7 subtype detected in central Europe in 2007-2011, Virus Res., 165 (2012), 126-133.
[5] Q Li, L Zhou, M Zhou, Z Chen, F Li, H Wu, et al., Preliminary
[6] Y. Hu, S. Lu, Z. Song, W. Wang, P. Hao, J. Li, et al., Association between adverse clinical outcome in human disease caused by novel influenza A H7N9 virus and sustained viral shedding and emergence of antiviral resistance, Lancet, 381 (2013), 2273-2279.
[7] G. Neumann, T. Noda, Y. Kawaoka, Emergence and pandemic potential of swine-origin H1N1 influenza virus, Nature, 459 (2009), 931-939.
[8] G. Lu, K. Buyyani, N. Goty, R. Donis, Z. Chen, Influenza a virus informatics: Genotype-centered database and genotype annotation, Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007), 2007. Available
[9] A. Melidou, G. Gioula, M. Exindari, D. Chatzidimitriou, E. Diza, N. Malisiovas, Molecular and phylogenetic analysis of the haemagglutinin gene of pandemic influenza H1N1 2009 viruses associated with severe and fatal infections, Virus Res., 151 (2010), 192-199.
[10] E. D. Kilbourne, Influenza pandemics of the 20th century, Emerg. Infect. Dis., 12 (2006), 9.
[11] W. H. Organization, Ten things you need to know about pandemic influenza (update of 14 October 2005), Wkly. Epidemiol. Rec., 80 (2005), 428-431.
[12] D. J. Smith, A. S. Lapedes, J. C. de Jong, T. M. Bestebrore, G. F. Rimmelzwaan, A. D. Osterhaus, et al., Mapping the antigenic and genetic evolution of influenza virus, Science, 305 (2004), 371-376.
[13] J. K. Taubenberger, D. M. Morens, 1918
[14] A Patient, Swine influenza A (H1N1) infection in two children-Southern California, March-April 2009, Morb. Mortal. Wkly. Rep., 58 (2009), 400-402.
[15] M. P. Girard, J. S. Tam, O. M. Assossou, M. P. Kieny, The 2009 A (H1N1) influenza virus
[16] E. Golemis, Protein-protein
[17] C. D. Hu, Y. Chinenov, T. K. Kerppola, Visualization of interactions among bZIP and Rel family proteins in living cells using bimolecular fluorescence complementation, Mol. Cell, 9 (2002), 789-798.
[18] E. Sprinzak, H. Margalit, Correlated sequence-signatures as markers of protein-protein interaction11Edited by G. von Heijne, J. Mol. Biol., 311 (2001), 681-692.
[19] W. K. Kim; J. Park; J. K. Suh; Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair.; Genome Inform.; 13, 42-50 (2002)
[20] S. K. Ng, Z. Zhang, S. H. Tan, Integrative approach for computationally inferring protein domain interactions, Bioinformatics, 19 (2003), 923-929.
[21] H. Yu, N. M. Luscombe, H. X. Lu, X. Zhu, Y. Xia, J. D. Han, et al., Annotation transfer between
[22] L. V Zhang, S. L. Wong, O. D. King, F. P. Roth, Predicting co-complexed protein pairs using genomic and proteomic data integration, BMC Bioinformatics, 5 (2004), 38.
[23] R. Jansen, H. Yu, D. Greenbaum, Y. Kluger, N. J. Krogan, S. Chung, et al., A Bayesian networks approach for predicting protein-protein interactions from genomic data, Science, 302 (2003), 449-453.
[24] Y. Qi, Z. Bar‐Joseph, J. Klein‐Seetharaman, Evaluation of different biological data and computational classification methods for use in protein interaction prediction, Proteins Struct. Funct. Bioinforma., 63 (2006), 490-500.
[25] M. D. Dyer, T. M. Murali, B. W. Sobral, Computational prediction of host-pathogen protein-protein interactions, Bioinformatics, 23 (2007), 159-166.
[26] A. Emamjomeh, B. Goliaei, J. Zahiri, R. Ebrahimpour, Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method, Mol Biosyst, 10 (2014), 3147-3154.
[27] O. Tastan, Y. Qi, J. G. Carbonell, J. Klein-Seetharaman, Prediction of interactions between HIV-1 and human proteins by information integration, in Biocomputing, World Scientific, (2009), 516-527.
[28] Y. Qi, O. Tastan, J. G. Carbonell, J. Klein-Seetharaman, J. Weston, Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins, Bioinformatics, 26 (2010), i645-i652.
[29] B. Barnes, M. Karimloo, A. Schoenrock, D. Burnside, E. Cassol, A. Wong, et al., Predicting novel protein-protein interactions between the HIV-1 virus and homo sapiens, 2016 IEEE EMBS International Student Conference (ISC),2016. Available
[30] S. Alguwaizani, B. Park, X. Zhou, D. S. Huang, K. Han, Predicting Interactions between Virus and Host Proteins Using Repeat Patterns and Composition of Amino Acids, J. Healthc. Eng., 2018 (2018).
[31] N. Zhang, M. Jiang, T. Huang, Y. D. Cai, Identification of Influenza A/H7N9 virus infection-related human genes based on shortest paths in a virus-human protein interaction network, Biomed. Res. Int., 2014 (2014).
[32] Gene Ontology Consortium, The Gene Ontology (GO) database and informatics resource, Nucleic Acids Res., 32 (2004), D258-D261.
[33] C. L. P. Eng, J. C. Tong, T. W. Tan, Predicting host tropism of influenza A virus proteins using random forest, BMC Med. Genomics, 7 (2014), S1.
[34] L. Nanni, A. Lumini, S. Brahnam, An Empirical Study of Different Approaches for Protein Classification, Sci. World J., 2014 (2014), 236717. · Zbl 1343.92387
[35] E. I. Zacharaki, Prediction of protein function using a deep convolutional neural network ensemble, PeerJ Comput. Sci., 3 (2017), e124.
[36] I. Saha, J. Zubek, T. Klingstrom, S. Forsberg, J. Wikander, M. Kierczak, et al., Ensemble learning prediction of protein-protein interactions using proteins functional annotations, Mol. Biosyst., 10 (2014), 820-830.
[37] L. Nanni, S. Brahnam, S. Ghidoni, A. Lumini, Toward a general-purpose heterogeneous ensemble for pattern classification, Comput. Intell. Neurosci., 2015 (2015).
[38] S. Kerrien, B. Aranda, L. Breuza, A. Bridge, F. Broackes-Carter, C. Chen, et al., The IntAct molecular interaction database in 2012, Nucleic Acids Res., 40 (2011), D841-D846.
[39] A. Chatr-aryamontri, A. Ceol, D. Peluso, A. Nardozza, S. Panni, F. Sacco, et al.
[40] I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S. M. Kim, D. Eisenberg, DIP, the Database of Interacting
[41] D. Szklarczyk, A. Franceschini, S. Wyder, K. Forslund, D. Heller, J. Huerta-Cepas, et al., STRING
[42] C. Stark, B. J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, M. Tyers
[43] J. Yu, M. Guo, C. J. Needham, Y. Huang, L. Cai, D. R. Westhead, Simple sequence-based kernels do not predict protein-protein interactions, Bioinformatics, 26 (2010), 2610-2614.
[44] Y. Huang, B. Niu, Y. Gao, L. Fu, W. Li, CD-HIT
[45] M. A. Tahir, J. Kittler, F. Yan, Inverse random under sampling for class imbalance problem and its application to multi-label classification, Pattern Recognit., 45 (2012), 3738-3750.
[46] S. Kawashima; P. Pokarowski; M. Pokarowska; A. Kolinski; T. Katayama; M. Kanehisa; AAindex: Amino acid index database; progress report 2008; Nucleic Acids Res; 36, D202-205 (2008)
[47] R. Bellman, R. Corporation, Dynamic Programming, Princeton University Press, (1957).
[48] A. Wagner, Energy constraints on the evolution of gene expression, Mol. Biol. Evol., 22 (2005), 1365-1374.
[49] P. M. Sharp, T. M. Tuohy, K. R. Mosurski, Codon usage in
[50] P. M. Sharp, W. H. Li, The codon Adaptation Index-a measure of directional synonymous codon usage bias, and its potential applications, Nucleic Acids Res., 15 (1987), 1281-1295.
[51] J. SantaLucia, A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics, Proc. Natl. Acad. Sci., 95 (1998), 1460-1465.
[52] P. Claverie, Calculation of interaction energy between triplets in the RNA 11 configuration, J. Mol. Biol., 56 (1971), 75-82.
[53] B. Lee, F. M. Richards, The interpretation of protein
[54] P. Klein; M. Kanehisa; C. DeLisi; Prediction of protein function from sequence properties: Discriminant analysis of a. data base; Biochim. Biophys. Acta; Protein Struct. Mol. Enzymol.; 787, 221-226 (1984)
[55] Y. Guo, L. Yu, Z. Wen, M. Li, Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences, Nucleic Acids Res., 36 (2008), 3025-3030.
[56] X. Wu, E. Pang, K. Lin, Z. M. Pei, Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene
[57] Y. R. Cho; W. Hwang; M. Ramanathan; A. Zhang; Semantic integration to identify overlapping functional modules in protein interaction networks; BMC Bioinformatics; 8., 265 (2007)
[58] P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, arXiv Prepr. C., 1995 (1995).
[59] J. J. Jiang, D. W. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy, arXiv Prepr. C., 1997 (1997).
[60] D. Lin; An information-theoretic definition of similarity; Icml; 98, 296-304 (1998)
[61] D. H. Wolpert, Stacked Generalization, Neural Networks, 5 (1992), 241-259.
[62] B. Khorsand, EvaluationMeasures: Collection of Model Evaluation Measure Functions, CRAN, 2016 (2016).
[63] D. W. Huang, B. T. Sherman, R. A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources, Nat. Protoc., 4 (2009), 44-57.
[64] F. Supek, M. Bošnjak, N. Škunca, T. Šmuc, REVIGO summarizes and visualizes long lists of gene ontology terms, PLoS One, 6 (2011), e21800.
[65] H. Hotelling, Analysis of a complex of statistical variables into principal components, J. Educ. Psychol., 24 (1933), 417. · JFM 59.1182.04
[66] R. Leardi, A. L. Gonzalez, Genetic algorithms applied to feature selection in PLS
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.