×

RNA interference data: from a statistical analysis to network inference. (English) Zbl 1270.62144

Heidelberg: Univ. Heidelberg, Naturwissenschaftlich-Mathematische Gesamtfakultät (Diss.). xviii, 170 p. (2012).
Summary: Viruses are the cause of many severe human diseases such as Hepatitis C, Dengue fever, AIDS, Infuenza and even cancer. In consequence of viral diseases several millions of people die every year all over the world. Due to the rapid evolution of viruses their drug development and treatment are especially difficult. The present work aims at getting a better understanding of the ongoing signaling processes of certain diseases. To do this, methods for the analysis and network inference of RNA interference (RNAi) data are presented.
Recent biological and technological advances in the field of RNAi enable the knockdown of individual genes in a high-content high-throughput manner. Thereby, a detailed quantification of perturbation effects on specific phenotypes can be assessed using multiparametric imaging. This in turn allows the identificattion of genes which are involved in certain biological processes such as virus-host factors used in the viral life-cycles. However, hit lists of already published RNAi screens show only a small overlap, even for studies of the same virus. This may be due to insufficient data analysis where the potential of microscopic screening data is not fully tapped since individual cell measurements are not taken into account for data normalization and hit scoring.
This thesis shows that for RNAi data studying Hepatitis C and Dengue virus the phenotypic effect after a perturbation is highly influenced by each cell’s population context. Therefore, novel methodologies are proposed which use the individual cell measurements for the data analysis and statistical scoring. This results in an increased sensitivity and specificity in comparison to already existing methods where these factors are disregarded. The method proposed here allows the identification of already existing as well as new hit genes which are significantly involved in the respective viral life-cycles.
The spatial and temporal placement of these hits, however, still remains unknown, and the ongoing signaling processes are only poorly understood. To understand the underlying biology from a system wide view it is necessary to infer the signaling cascade of involved factors in detail. One of the challenges of network inference is the exponentially increasing dimensionality with an increasing number of nodes. The method proposed in this thesis is formulated as a linear optimization problem which can be solved efficiently even for large data sets. The model can incorporate data of single or multiple perturbations at the same time. The aim is to defend the network topology which best represents the given data. Based on simulated data for an small artificial five-node example the robustness of the model against noisy or incomplete data is demonstrated.
Furthermore, for this small as well as for larger networks with 10 to 52 nodes it is shown that the model achieves superior results than random guessing. In addition, the performance and the computation time of large networks are better than another approach which has been recently published. Moreover, the network inference method presented here has been applied to data measuring the signaling of ErbB proteins. These proteins are associated with the development of many human cancers. The results of the network inference show that already known signaling cascades can be successfully reconstructed from the data. Additionally, newly learned protein-protein interactions indicate that there are several still unknown feedback and feedforward loops. The proteins of these loops may serve as potential targets to control ErbB signaling. The knowledge about these factors is an important step towards the development of new drugs and therefore,this helps to fight ErbB related diseases.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62-02 Research exposition (monographs, survey articles) pertaining to statistics
62M45 Neural nets and related approaches to inference from stochastic processes
62-07 Data analysis (statistics) (MSC2010)
92C50 Medical applications (general)

Software:

LOWESS
PDFBibTeX XMLCite