zbMATH — the first resource for mathematics

Extended sunflower hidden Markov models for the recognition of homotypic cis-regulatory modules. (English) Zbl 1281.92048
Beißbarth, Tim (ed.) et al., German conference on bioinformatics 2013, GCB’13, Göttingen, Germany, September 10–13, 2013. Selected papers based on the presentations at the conference. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik (ISBN 978-3-939897-59-0). OASIcs – OpenAccess Series in Informatics 34, 101-109, electronic only (2013).
Summary: The transcription of genes is often regulated not only by transcription factors binding at single sites per promoter, but by the interplay of multiple copies of one or more transcription factors binding at multiple sites forming a cis-regulatory module. The computational recognition of cis-regulatory modules from ChIP-seq or other high-throughput data is crucial in modern life and medical sciences. A common type of cis-regulatory modules are homotypic clusters of binding sites, i.e., clusters of binding sites of one transcription factor. For their recognition the homotypic sunflower hidden Markov model is a promising statistical model. However, this model neglects statistical dependences among nucleotides within binding sites and flanking regions, which makes it not well suited for de-novo motif discovery. Here, we propose an extension of this model that allows statistical dependences within binding sites, their reverse complements, and flanking regions. We study the efficacy of this extended homotypic sunflower hidden Markov model based on ChIP-seq data from the human ENCODE project and find that it often outperforms the traditional homotypic sunflower hidden Markov model.
For the entire collection see [Zbl 1279.92004].
92D10 Genetics and epigenetics
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI