×

zbMATH — the first resource for mathematics

Transcription factor binding sites detection by using alignment-based approach. (English) Zbl 1397.92211
Summary: Gene expression is the main cause for the existence of various phenotypes. Through this procedure, the information stored in DNA rises to the phenotype. Essentially, gene expression is dependent upon the successful binding of transcription factors (TFs) – a specific type of proteins – to explicit positions in its upstream, TF binding sites (TFBSs). Unfortunately, finding these TFBSs is costly and laborious; therefore, discovering TFBSs computationally is a significant problem that many researches endeavor to solve. In this paper, a new TFBS discovery method is presented by considering known biological facts about TFBSs. The input to this method includes sequences with arbitrary lengths and the output comprises positions that tend to be TFBS. Through the application of previous methods along with a method that focuses on biological and simulated datasets, it is shown that this method achieves higher accuracy in discovering TFBSs.
MSC:
92C40 Biochemistry, molecular biology
92-04 Software, source code, etc. for problems pertaining to biology
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Andrulis, E.D.; Neiman, A.M.; Zappulla, D.C.; Sternglanz, R., Perinuclear localization of chromatin facilitates transcriptional silencing, Nature, 394, 6693, 592-595, (1995)
[2] Bailey, T.L., Elkan, C., 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 127-138.
[3] Bailey, T.L.; Elkan, C., Unsupervised learning of multiple motifs in biopolymers using expectation maximization, Mach. learn., 21, 1, 51-80, (1995)
[4] Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C.; Nielsen, H., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 16, 5, 412-424, (2000)
[5] Chua, G.; Robinson, M.D.; Morris, Q.; Hughes, T.R., Transcriptional networks: reverse-engineering gene regulation on a global scale, Curr. opin. microbiol., 7, 6, 638-646, (2004)
[6] Das, M.; Dai, H.K., A survey of DNA motif finding algorithms, BMC bioinf., 8, 7, S21+, (2007)
[7] Giulio, P.; Valentini, G., Classification of co-expressed genes from DNA regulatory regions, Inf. fusion, 10, 3, 233-241, (2009)
[8] GuhaThakurta, D., Computational identification of transcriptional regulatory elements in DNA sequence, Nucl. acids res., 34, 12, 3585-3598, (2006)
[9] Hertz, G.Z.; Stormo, G.D., Identifying DNA and protein patterns with statistically significant alignments of multiple sequences, Bioinformatics, 15, 7, 563-577, (1999)
[10] Hu, J.; Li, B.; Kihara, D., Limitations and potentials of current motif discovery algorithms, Nucl. acids res., 33, 15, 4899-4913, (2005)
[11] Keith, D.R., DNA methylation and chromatin—unraveling the tangled web, Oncogene, 21, 35, 5361-5379, (2002)
[12] Kim, J.T.; Martinetz, T.; Polani, D., Bioinformatic principles underlying the information content of transcription factor binding site, J. theoret. biol., 220, 4, 529-544, (2003)
[13] Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C., 1993. Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment, Science, New York, NY, vol. 262(5131), pp. 208-214.
[14] Liu, X.S., Brutlag, D.L., Liu, J.S., 2001. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pacific Symposium on Biocomputing, pp. 127-138.
[15] Liu, X.S.; Brutlag, D.L.; Liu, J.S., An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments, Nat. biotechnol., 20, 8, 835-839, (2002)
[16] Lodish, H.; Berk, A.; Kaiser, C.A.; Krieger, M.; Scott, M.P.; Bretscher, A.; Ploegh, H.; Matsudaira, P., Molecular cell biology (lodish, molecular cell biology, (2007), W.H. Freeman
[17] MacIsaac, K.D.; Fraenkel, E., Practical strategies for discovering regulatory DNA sequence motifs, Plos comput. biol., 2, 4, e36+, (2006)
[18] Pevzner, P.A., Sing-hoi, S., 2000. Combinatorial approaches to finding subtle signals in dna sequences In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), vol. 8. AAAI Press, pp. 269-278.
[19] Roth, F.P.; Hughes, J.D.; Estep, P.W.; Church, G.M., Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mrna quantitation, Nat. biotechnol., 16, 10, 939-945, (1998)
[20] Salgado, H.; Gama-Castro, S.; Martinez-Antonio, A.; Diaz-Peredo, E.; Sanchez-Solano, F.; Peralta-Gil, M.; Garcia-Alonso, D.; Jimenez-Jacinto, V.; Santos-Zavaleta, A.; Bonavides-Martinez, C.; Collado-Vides, J., Regulondb (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12, Nucl. acids res., 32, Suppl. 1, D303-306, (2004)
[21] Sandelin, A.; Alkema, W.; Engström, P.; Wasserman, W.W.; Lenhard, B., JASPAR: an open-access database for eukaryotic transcription factor binding profiles, Nucl. acids res., 32, 91-94, (2004)
[22] Smith, T.F.; Waterman, M.S., Identification of common molecular subsequences, J. mol. biol., 147, 1, 195-197, (1981)
[23] Stormo, G.D.; Hartzell, G.W., Identifying protein-binding sites from unaligned DNA fragments, Proc. natl. acad. sci. USA, 86, 4, 1183-1187, (1989)
[24] Thijs, G.; Marchal, K.; Lescot, M.; Rombauts, S.; De Moor, B.; Rouzé, P.; Moreau, Y., A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes, J. comput. biol., 9, 2, 447-464, (2002)
[25] Tompa, M.; Li, N.; Bailey, T.L.; Church, G.M.; De Moor, B.; Eskin, E.; Favorov, A.V.; Frith, M.C.; Fu, Y.; Kent, J.W.; Makeev, V.J.; Mironov, A.A.; Noble, W.S.S.; Pavesi, G.; Pesole, G.; Régnier, M.; Simonis, N.; Sinha, S.; Thijs, G.; van Helden, J.; Vandenbogaert, M.; Weng, Z.; Workman, C.; Ye, C.; Zhu, Z., Assessing computational tools for the discovery of transcription factor binding sites, Nat. biotechnol., 23, 1, 137-144, (2005)
[26] van Driel, R.; Fransz, P.F.; Verschure, P.J., The eukaryotic genome: a system regulated at different hierarchical levels, J. cell sci., 116, Pt 20, 4067-4075, (2003)
[27] Wingender, E.; Dietze, P.; Karas, H.; Knüppel, R., TRANSFAC: a database on transcription factors and their DNA binding sites, Nucl. acids res., 24, 1, 238-241, (1996)
[28] Zare-Mirakabad, F.; Ahrabian, H.; Sadeghi, M.; Nowzari-Dalini, A., New scoring schema for finding motifs in DNA sequences, BMC bioinf., 10, 1, 1-21, (2009) · Zbl 1189.92041
[29] Zare-Mirakabad, F.; Ahrabian, H.; Sadeghi, M.; Hashemifar, S.; Nowzari-Dalini, A.; Goliaei, B., Genetic algorithm for dyad pattern finding in DNA sequences, Genes genet. syst., 84, 1, 81-93, (2009) · Zbl 1171.92021
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.