Analysis of expression profile and gene variation via development of methods for next generation sequencing data. (English) Zbl 1431.92006

Göttingen: Univ. Göttingen (Diss.). x, 96 p. (2018).
Summary: Since the last ten to twenty years, the cost of sequencing the human genome decreased continuously. Therefore the interest in RNA sequencing (RNA-Seq) rose as it can be used to discover the molecular mechanisms behind gene expression profiles of cells in different healthy or disease states. The intention of this dissertation is two-fold, first identify the best performing bioinformatical methods for RNA-Seq analysis at hand and based on this knowledge generate a standardised work ow, which then could be used within the MetastaSys consortium. Second, answering the question: Is it possible to detect somatic mutations in cancer based on RNA-Seq data reliably? This was of particular interest as the RNA-Seq data was already created for differential gene expression analysis. Getting further information on mutation status without the need to recreate the data for Exome-Seq would, on the one hand, save the expensive costs for Exome-Seq and would, on the other hand, save precious biological material of cancer metastases patients, which are precious to the physicians.
For the RNA-Seq work ow identification data based on the microarray and Illumina RNA-Seq platforms were created. Therefore two data sets were created: human patient data from rectal cancer metastases in the liver and human cell lines from Burkitt’s Lymphoma, which was stimulated with the B-Cell activating factor BAFF. The advantages of RNA-Seq over Microarray became clear during the comparative analysis in the first publication (see 3.1). The primary focus was the performance evaluation of bioinformatical methods based on the given data sets. The workflow performance was evaluated during the quantification, differential gene expression analysis, and functional profiling steps. Results showed, that despite the workflow with TopHat2 and Cuffinks, all workflows achieved nearly equally good results with a slight preference for STAR and RSEM, as STAR achieved the overall highest mapping rate and RSEM incorporated multi-mapped reads for quantification and was also capable of quantifying transcript isoforms next to genes. Afterwards, the best performing workflow pipeline was applied to mice in another study (see 3.3). The mice developed metastases in the liver from colorectal cancer. The bioinformatical approach streamlined via the workflow helped a lot in interpreting the biology behind the expression of metastasis enhancing genes. It was possible to show links of metastasis-related genes and their stimulation via the liver environment. These genes were associated with tissue remodelling, cell proliferation, adhesion, wnt activity, transcription/regulation, and inhibition of apoptosis.
The question if a reliable identification of somatic mutation is possible in RNA-Seq is tackled by implementing Wileup, a program is written in Perl. Wileup’s performance was evaluated against the state-of-the-art somatic variant caller Mutect2 from the GATK tool suite for matched RNA-Seq and Exome-Seq samples of 14 patients with either brain (seven patients) or liver (seven patients) metastases (see 3.2). Results showed that Wileup was capable of finding all somatic mutations in RNA-Seq identified by Mutect2 in Exome-Seq. In contrast, Mutect2 and Wileup identified unique germline mutation only found in either of the methods. These could be explained due to a lack of expression on the RNA-Seq data or due to too high duplication level in the Exome-Seq data. Furthermore, the somatic mutations could be independently validated by pathological annotation data. For the uniquely found germline mutations of either method, it was possible to verify all of them, as they were re-identified in the exome-sequenced blood samples of the corresponding patients.
In conclusion, the presented studies in this thesis contribute towards establishing pipeline standards in transcriptomics, with the focus on differential expression analysis (DEA), and exploring the capabilities of mutation calling in RNA-Seq.


92-02 Research exposition (monographs, survey articles) pertaining to biology
92D10 Genetics and epigenetics
92C40 Biochemistry, molecular biology
92C50 Medical applications (general)
Full Text: Link Link