zbMATH — the first resource for mathematics

Bayesian methods for predicting interacting protein pairs using domain information. (English) Zbl 1147.62094
Summary: Protein-protein interactions (PPIs) play important roles in most fundamental cellular processes including cell cycle, metabolism, and cell proliferation. Therefore, the development of effective statistical approaches to predicting protein interactions based on recently available large-scale experimental data is very important. Because protein domains are the functional units of proteins and PPIs are mostly achieved through domain-domain interactions (DDIs), the modeling and analysis of protein interactions at the domain level may be more informative and insightful. However, due to the large number of domains, the number of parameters to be estimated is very large, yet the amount of information for statistical inference is quite limited.
We propose a full Bayesian method and a semi-Bayesian method for simultaneously estimating DDI probabilities, the false positive rate, and the false negative rate of high-throughput data through integrating data from several organisms. We also propose a model to associate protein interaction probabilities with domain interaction probabilities that reflects the number of domains in each protein. Our Bayesian methods are compared with the likelihood-based approach developed using the expectation maximization algorithm. We show that the full Bayesian method has the smallest mean square error through both simulations and theoretical justification under a special scenario. The large-scale PPI data obtained from high-throughput yeast two-hybrid experiments are used to demonstrate the advantages of the Bayesian approaches.

62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
92C40 Biochemistry, molecular biology
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI
[1] Aloy, Structure-based assembly of protein complexes in yeast, Science 302 pp 2026– (2004)
[2] Bader, Gaining confidence in high-throughput protein interaction networks, Nature Biotechnology 22 pp 78– (2004)
[3] Bateman, The Pfam protein families database, Nucleic Acids Research 32 pp D138– (2004) · Zbl 05434967
[4] Dempster, Maximum likeliood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B 39 pp 1– (1977) · Zbl 0364.62022
[5] Deng, Inferring domain-domain interactions from protein-protein interactions, Genome Research 12 pp 1504– (2002)
[6] Enright, Protein interaction maps for complete genomes based on gene fusion events, Nature 402 pp 86– (1999)
[7] Gilks, Adaptive rejection sampling for Gibbs sampling, Applied Statistics 41 pp 337– (1992) · Zbl 0825.62407
[8] Gilks, Adaptive rejection Metropolis sampling, Applied Statistics 44 pp 455– (1995) · Zbl 0893.62110
[9] Giot, A protein interaction map of Drosophila melanogaster, Science 302 pp 1727– (2003)
[10] Goh, Co-evolutionary analysis reveals insights into protein-protein interactions, Journal of Molecular Biology 324 pp 177– (2002)
[11] Gomez, Probabilistic prediction of unknown metabolic and signal-transduction networks, Genetics 159 pp 1291– (2001)
[12] Gomez, Learning to predict protein-protein interactions from protein sequences, Bioinformatics 19 pp 1875– (2003)
[13] Gough, SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments, Nucleic Acids Research 30 pp 268– (2002) · Zbl 05435612
[14] Ito, A comprehensive two-hybrid analysis to explore the yeast protein interactome, Proceedings of the National Academy of Sciences of the United States of America 98 pp 4569– (2001)
[15] Jansen, A Bayesian networks approach for predicting protein-protein interactions from genomic data, Science 302 pp 449– (2003)
[16] Lee, An integrated approach to the prediction of domain-domain interactions, BMC Bioinformatics 7 (269) (2006)
[17] Lee, A probabilistic functional network of yeast genes, Science 306 pp 1555– (2004)
[18] Letunic, SMART 4.0: Towards genomic data integration, Nucleic Acids Research 32 pp D142– (2004) · Zbl 05436160
[19] Li, A map of the interactome network of the metazoan C. elegans, Science 303 pp 540– (2004)
[20] Liu, Inferring protein-protein interactions through high-throughput interaction data from diverse organisms, Bioinformatics 21 pp 3279– (2005)
[21] Lu, Multimeric threading-based prediction of protein-protein interactions on a genomic scale: Application to the Saccharomyces cerevisiae proteome, Genome Research 13 pp 1146– (2003)
[22] Marcotte, Mining literature for protein-protein interactions, Bioinformatics 17 pp 357– (2001)
[23] Mrowka, Is there a bias in proteome research?, Genome Research 11 pp 1971– (2001)
[24] Pawson, Assembly of cell regulatory systems through protein interaction domains, Science 300 pp 445– (2003)
[25] Pazos, Similarity of phylogenetic trees as indicator of protein-protein interaction, Protein Engineering 14 pp 609– (2001)
[26] Ramani, Exploiting the co-evolution of interacting proteins to discover interaction specificity, Journal of Molecular Biology 327 pp 273– (2003)
[27] Sprinzak, Correlated sequence-signatures as markers of protein-protein interaction, Journal of Molecular Biology 311 pp 681– (2001)
[28] Tsoka, Prediction of protein interactions: Metabolic enzymes are frequently involved in gene fusion, Nature Genetics 26 pp 141– (2000)
[29] Uetz, A comprehensive analysis of protein-protein interaction in Saccharomyces cerevisiae, Nature 403 pp 623– (2000)
[30] von Mering, Comparative assessment of large-scale data sets of protein-protein interactions, Nature 417 pp 399– (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.