Node sampling for protein complex estimation in bait-prey graphs.

*(English)*Zbl 1329.92047Summary: In cellular biology, node-and-edge graph or “network” data collection often uses bait-prey technologies such as co-immunoprecipitation (CoIP). Bait-prey technologies assay relationships or “interactions” between protein pairs, with CoIP specifically measuring protein complex co-membership. Analyses of CoIP data frequently focus on estimating protein complex membership. Due to budgetary and other constraints, exhaustive assay of the entire network using CoIP is not always possible. We describe a stratified sampling scheme to select baits for CoIP experiments when protein complex estimation is the main goal. Expanding upon the classic framework in which nodes represent proteins and edges represent pairwise interactions, we define generalized nodes as sets of adjacent nodes with identical adjacency outside the set and use these as strata from which to select the next set of baits. Strata are redefined at each round of sampling to incorporate accumulating data. This scheme maintains user-specified quality thresholds for protein complex estimates and, relative to simple random sampling, leads to a marked increase in the number of correctly estimated complexes at each round of sampling. The R package seqSample contains all source code and is available at http://vault.northwestern.edu/~dms877/Rpacks/.

PDF
BibTeX
XML
Cite

\textit{D. M. Scholtens} and \textit{B. D. Spencer}, Stat. Appl. Genet. Mol. Biol. 14, No. 4, 391--411 (2015; Zbl 1329.92047)

Full Text:
DOI

##### References:

[1] | Altaf-Ul-Amin, M., Y. Shinbo, K. Mihara, K. Kurokawa and S. Kanaya (2006): “Development and implementation of an algorithm for detection of protein complexes in large interaction networks,” BMC Bioinformatics, 7, 207. |

[2] | Aryee, M. J. A. and J. Quackenbush (2008): “An optimized predictive strategy for interactome mapping,” Nat. Biotechnol., 20, 991-997. |

[3] | Bader, G. D. and C. W. Hogue (2002): “Analyzing yeast protein-protein interaction data obtained from different sources,” Nat. Biotechnol., 20, 991-997. |

[4] | Bader, G. D. and C. W. Hogue (2003): “An automated method for finding molecular complexes in large protein interaction networks,” BMC Bioinformatics, 4, 2. |

[5] | Casey, F. P., G. Cagney, N. J. Krogan and D. C. Shields (2008): “Optimal stepwise experimental design for pairwise functional interaction studies,” Bioinformatics, 24, 2733-2739. |

[6] | Chiang, T. C. and D. Scholtens (2009): “A general pipline for quality and statistical assessment of protein interaction data using R and Bioconductor,” Nat. Protoc., 4, 535-546. |

[7] | Chiang, T. C., D. Scholtens, D. Sarkar, R. Gentleman and W. Huber (2007): “Coverage and error models of protein-protein interaction data by directed graph analysis,” Genome Biol., 8, R186. |

[8] | Damaschke, P. (2011): “Finding hidden hubs and dominating sets in sparse graphs by randomized neighborhood queries,” Networks, 57, 344-350. · Zbl 1223.05208 |

[9] | Enright, A. J., S. Van Dongen and C. A. Ouzounis (2002): “An efficient algorithm for large-scale detection of protein families,” Nuc. Acids Res., 30, 1575-1584. |

[10] | Ewing, R. M., P. Chu, F. Elisma, H. Li, P. Taylor, S. Climie, L. McBroom-Cerajewski, M. D. Robinson, L. O’Connor, M. Li, R. Taylor, M. Dharsee, Y. Ho, A. Heilbut, L. Moore, S. Zhang, O. Ornatsky, Y. V. Bukhman, M. Ethier, Y. Sheng, J. Vasilescu, M. Abu-Farha, J. P. Lambert, H. S. Duewel, I. I. Stewart, B. Kuehl, K. Hogue, K. Colwill, K. Gladwish, B. Muskat, R. Kinach, S. L. Adams, M. F. Moran, G. B. Morin, T. Topaloglou and D. Figeys. (2007): “Large-scale mapping of human protein-protein interactions by mass spectrometry,” Mol. Syst. Biol., 3, 89. |

[11] | Freidel, C. C., J. Krumsiek and R. Zimmer (2009): “Bootstrapping the interactome: unsupervised identification of protein complexes in yeast,” J. Comp. Biol., 16, 971-987. |

[12] | Gavin, A. C., M. Bösche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J. M. Rick, A. M. Michon, M. Cruciat, C M amd Remor, C. Höfert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B. Huhse, C. Leutwein, M. A. Heurtier, R. R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork, B. Seraphin, B. Kuster, G. Neubauer and G. Superti-Furga. (2002): “Functional organization of the yeast proteome by systematic analysis of protein complexes,” Nature, 415, 141-147. |

[13] | Gavin, A. C., P. Aloy, P. Grandi, R. Krause, M. Boesche, M. Marzioch, C. Rau, L. J. Jensen, S. Bastuck, B. Dumpelfeld, A. Edelmann, M. A. Heurtier, V. Hoff-man, C. Hoefert, K. Klein, M. Hudak, A. M. Michon, M. Schelder, M. Schirle, M. Remor, T. Rudi, S. Hooper, A. Bauer, T. Bouwmeester, G. Casari, G. Drewes, G. Neubauer, J. M. Rick, B. Kuster, P. Bork, R. B. Russell and G. Superti-Furga. (2006): “Proteome survey reveals modularity of the yeast cell machinery,” Nature, 440, 631-636. |

[14] | Geva, G. and R. Sharan (2011): “Identification of protein complexes from co-immunoprecipitation data,” Bioinformatics, 27, 111-117. |

[15] | Goodman, L. A. (1961): “Snowball sampling,” Ann. Math. Stat., 32, 148-170. · Zbl 0099.14203 |

[16] | Güldener, U., M. Münsterkötter, G. Kastenmüller, N. Strack, J. van Helden, C. Lemer, J. Richelles, S. J. Wodak, J. García-Martínez, J. E. Pérez-Ortin, H. Michael, A. Kaps, E. Talla, B. André, J. L. Souciet, J. De Montigny, E. Bon, C. Gaillardin and H. W. Mewes (2005): “CYGD: the comprehensive yeast genome database,” Nuc. Acids Res., 33, D362-C368. |

[17] | Han, J. D., D. Dupuy, N. Bertin, M. E. Cusick and M. Vidal (2005): “Effect of sampling on topology predictions of protein-protein interaction networks,” Nat. Biotechnol., 23, 839-844. |

[18] | Handcock, M. S. and K. J. Gile (2010): “Modeling social networks from sampled data,” Ann. Appl. Stat., 4, 5-25. · Zbl 1189.62187 |

[19] | Kavvadias, D. J. and E. C. Stavropoulos (2005): “An efficient algorithm for the transversal hypergraph generation,” J. Graph Alg. Appl., 9, 239-264. · Zbl 1088.05069 |

[20] | Kikugawa, S., K. Nishikata, K. Murakami, Y. Sato, M. Suzuki, M. Altaf-Ul-Amin, S. Kanaya and T. Imanishi (2012): “PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from h-invitational protein-protein interactions integrative dataset,” BMC Syst. Biol., 6 Suppl 2, S7. |

[21] | Krogan, N. J., M. H. Lam, J. Fillingham, M. C. Keogh, M. Gebbia, J. Li, N. Datta, G. Cagney, S. Buratowski, A. Emili and J. F. Greenblatt (2004): “Proteasome involvement in the repair of DNA double-strand breaks,” Mol. Cell, 16, 1027-1034. |

[22] | Krogan, N. J., G. Cagney, H. Yu, G. Zhong, X. Guo, A. Ignatchenko, J. Li, S. Pu, N. Datta, A. P. Tikusis, T. Punna, J. M. Peregrin-Alvarez, M. Shales, X. Zhang, M. Davey, M. D. Robinson, A. Paccanaro, J. E. Bray, A. Sheung, B. Beattie, D. P. Richards, V. Canadien, A. Lalev, F. Mena, P. Wong, A. Starostine, M. M. Canete, J. Vlasbom, S. Wu, C. Orsi, S. R. Collins, S. Chandran, R. Haw, J. J. Rilstone, K. Gandi, N. J. Thompson, G. Musso, P. St Onge, S. Ghanny, M. H. Lam, G. Butland, A. M. Altaf-Ul, S. Kanaya, A. Shilatifard, E. O’Shea, J. S. Weissman, C. J. Ingles, T. R. Hughes, J. Parkinson, M. Gerstein, S. J. Wodak, A. Emili and J. F. Greenblatt. (2006): “Global landscape of protein complexes in the yeast Saccharomyces cerevisiae,” Nature, 440, 637-643. |

[23] | Lappe, M. and L. Holm (2004): “Unraveling protein interaction networks with near-optimal efficiency,” Nat. Biotechnol., 22, 98-103. |

[24] | Macropol, K., T. Can and A. K. Singh (2009): “Rrw: repeated random walks on genome-scale protein networks for local cluster discovery,” BMC Bioinformatics, 10, 283. |

[25] | Pu, S., J. Wong, B. Turner, E. Cho and S. J. Wodak (2009): “Up-to-date catalogues of yeast protein complexes,” Nuc. Acids Res., 37, 825-831. |

[26] | Royer, L., M. Reimann, A. F. Stewart and M. Schroeder (2012): “Network compression as a quality measure for protein interaction networks,” PLOS One, 7, e35729. |

[27] | Ruepp, A., B. Brauner, I. Dunger-Kaltenbach, G. Frishman, C. Montrone, M. Stransky, B. Waegele, T. Schmidt, O. N. Doudieu, V. Stümpflen and H. W. Mewes (2008): “CORUM: the Comprehensive Resource of Mammalian Protein Complexes,” Nuc. Acids Res., 36, D646-D650. |

[28] | Saha, S., P. Kaur and R. M. Ewing (2010): “The bait compatibility index: computational bait selection for interaction proteomics experiments,” J. Proteome Res., 9, 4972-4981. |

[29] | Scholtens, D., M. Vidal and R. Gentleman (2005): “Local modeling of global interactome networks,” Bioinformatics, 21, 3548-3557. |

[30] | Schwartz, A. S., J. Yu, K. R. Gardenour, R. Finley Jr and T. Ideker (2009): “Cost-effective strategies for completing the interactome,” Nat. Methods, 6, 55-61. |

[31] | The Gene Ontology Consortium (2000): “Gene Ontology: a tool for the unification of biology,” Nat. Genet., 25, 25-29. |

[32] | Wasserman, S. and K. Faust (1997): Social network analysis, New York: Cam-bridge University Press. · Zbl 0926.91066 |

[33] | Xie, Z., C. K. Kwoh, X.-L. Li and M. Wu (2011): “Construction of co-complex score matrix for protein complex prediction from ap-ms data,” Bioinformatics, 27, i159-i166. |

[34] | Zhang, B., B.-H. Park, T. Karpinets and N. F. Samatova (2008): “From pull-down data to protein interaction networks and complexes with biological relevance,” Bioinformatics, 24, 979-986. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.