BacPP: bacterial promoter prediction – a tool for accurate sigma-factor specific assignment in enterobacteria. (English) Zbl 1397.92246

Summary: Promoter sequences are well known to play a central role in gene expression. Their recognition and assignment in silico has not consolidated into a general bioinformatics method yet. Most previously available algorithms employ and are limited to \(\sigma\)70-dependent promoter sequences. This paper presents a new tool named BacPP, designed to recognize and predict Escherichia coli promoter sequences from background with specific accuracy for each \(\sigma\) factor (respectively, \(\sigma 24\), 86.9%; \(\sigma 28\), 92.8%; \(\sigma 32\), 91.5%; \(\sigma 38\), 89.3%, \(\sigma 54\), 97.0%; and \(\sigma 70\), 83.6%). BacPP is hence outstanding in recognition and assignment of sequences according to \(\sigma\) factor and provide circumstantial information about upstream gene sequences. This bioinformatic tool was developed by weighing rules extracted from neural networks trained with promoter sequences known to respond to a specific \(\sigma\) factor. Furthermore, when challenged with promoter sequences belonging to other enterobacteria BacPP maintained 76% accuracy overall.


92C40 Biochemistry, molecular biology
90C29 Multi-objective and goal programming
92D10 Genetics and epigenetics
68T05 Learning and adaptive systems in artificial intelligence
92-04 Software, source code, etc. for problems pertaining to biology
Full Text: DOI


[1] Andrews, R.; Diederich, J.; Tickle, A.B., A survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge-based systems, 8, 6, 373-389, (1995)
[2] Aldridge, P.; Gnerer, J.; Karlinsey, J.E.; Hughes, K.T., Transcriptional and translational control of the salmonella flic gene, Journal of bacteriology, 188, 12, 4487-4496, (2006)
[3] Askary, A.; Masoudi-Nejad, A.; Sharafi, R.; Mizbani, A.; Parizi, S.N.; Purmasjedi, M., N4: a precise and highly sensitive promoter predictor using neural network fed by nearest neighbors, Genes & genetic systems, 84, 6, 425-430, (2009)
[4] Borukov, S.; Nudler, E., RNA polymerase holoenzyme: structure, function and biological implications, Current opinion in microbiology, 6, 93-100, (2003)
[5] Burden, S.; Lin, Y.-X.; Zhang, R., Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences, Bioinformatics, 21, 5, 601-607, (2005)
[6] Battistella, E.; Cechin, A.L., The protein folding problem solved by a fuzzy inference system extracted from an artificial neural network, Lecture notes in computer science, 3315, 474-483, (2004)
[7] Beach, M.B.; Osuna, R., Identification and characterization of the fis operon in enteric bacteria, Journal of bacteriology, 180, 5932-5946, (1998)
[8] Brunak, S.; Engelbrecht, J.; Knudsen, S., Prediction of human mrna donor and acceptor sites from the DNA sequence, Journal of molecular biology, 220, 49-65, (1991)
[9] Barrios, H.; Valderrama, B.; Morett, E., Compilation and analysis of σ54-dependent promoter sequences, Nucleic acids research, 27, 22, (1999), 4305-43-13
[10] Bland, C.; Newsome, A.S.; Markovets, A.A., Promoter prediction in E. coli based on SIDD profiles and artificial neural networks, BMC bioinformatics, 11, Suppl. 6, S17, (2010), (32(5))
[11] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine, Protein & peptide letters, 16, 27-31, (2009)
[12] Cotik, V.; Zaliz, R.R.; Zwir, I., A hybrid promoter analysis methodology for prokaryotic genomes, Fuzzy sets and systems, 1, 83-102, (2005) · Zbl 1062.92028
[13] Chou, K.C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), Journal of theoretical biology, 273, 236-247, (2011) · Zbl 1405.92212
[14] Chou, K.C.; Shen, H.B., Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization, Plos one, 5, e11335, (2010)
[15] Ching, G.; Inouye, M., Expression of the proteus mirabilis lipoprotein gene in Escherichia coli, The American society of biological chemists, 261, 4600-4606, (1986)
[16] Castellanos, M.I.; Harrison, D.J.; Smith, J.M.; Labahn, S.K.; Levy, K.M.; Wing, H.J., Virb alleviates H-NS repression of the icsp promoter in shigella flexneri from sites more than one kilobase upstream of the transcription start site, Journal of bacteriology, 191, 12, 4047-4050, (2000)
[17] Chou, K.C.; Zhang, C.T., Review: prediction of protein structural classes, Critical reviews in biochemistry and molecular biology, 30, 1995, 275-349, (1995)
[18] Chou, K.C.; Shen, H.B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms, Nature protocols, 3, 153-162, (2008)
[19] Chou, K.C.; Shen, H.B., Review: recent progresses in protein subcellular location prediction, Analytical biochemistry, 370, 1-16, (2007)
[20] Crooks, G.E.; Hon, G.; Chandonia, J.M.; Brenner, S.E., Weblogo: a sequence logo generator, Genome research, 14, 1188-1190, (2004)
[21] Demeler, B.; Zhou, G., Neural network optimization for E. coli promoter prediction, Nucleic acids research, 19, 1593-1599, (1991)
[22] Gordon, J.J.; Towsey, M.W.; Hogan, J.M.; Mathews, S.A.; Timms, P., Improved prediction of bacterial transcription start sites, Bioinformatics, 22, 2, 142-148, (2006)
[23] Gama-Castro, S.; Jimenez-Jacinto, V.; Peralta-Gil, M.; Santos-Zavaleta, A.; Peñaloza-Spinola, M.I.; Contreras-Moreira, B.; Segura-Salazar, J.; Muñiz-Rascado, L.; Martinez-Flores, I.; Salgado, H.; Bonavides-Martinez, C.; Abreu-Goodger, C.; Rodríguez-Penagos, C.; Miranda-Ríos, J.; Morett, E.; Merino, E.; Huerta, A.M.; Treviño-Quintanilla, L.; Collado-Vides, J., Regulondb (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and text press navigation, Nucleic acids research, 36, D120-D124, (2008)
[24] Gordon, L.; Chervonenkis, A.; Gammerman, A.J.; Shahmuradov, I.A.; Solovyev, V.V., Sequence alignment for recognition of promoter regions, Bioinformatics, 19, 15, 1964-1971, (2003)
[25] Hu, B.; Zhu, J.; Shen, S.C.; Yu, G., A promoter region binding protein and DNA gyrase regulate anaerobic transcription of nifla in enterobacter cloacae, Journal of bacteriology, 182, 14, 3920-3923, (2000)
[26] Ibanez-Ruiz, M.; Robbe-Saule, V.; Hermant, D.; Labrude, S.; Norel, F., Identification of rpos (σS)-regulated genes in salmonella enteric serovar typhimurium, Journal of bacteriology, 182, 20, 5749-5756, (2000)
[27] Janga, S.C.; Collado-Vides, J., Structure and evolution of gene regulatory networks in microbial genomes, Research microbiology, 158, 787-794, (2007)
[28] Kanhere, A.; Bansal, M., Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes, Nucleic acids research, 33, 10, 3165-3175, (2005)
[29] Kutsukake, K.; Ohya, Y.; Iin, T., Transcriptional analysis of the flagellar regulon of salmonella typhimurium, Journal of bacteriology, 172, 2, 741-747, (1990)
[30] Kandaswamy, K.K.; Chou, K.C.; Martinetz, T.; Moller, S.; Suganthan, P.N.; Sridharan, S.; Pugalenthi, G., AFP-pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties, Journal of theoretical biology, 27056-27062, (2011)
[31] Li, Q.-Z.; Lin, H., The recognition and prediction of σ70 promoters in Escherichia coli K-12, Journal of theoretical biology, 242, 135-141, (2006)
[32] Lewin, B., Genes IX, (2008), Jones & Bartlett Publishers Sudbury
[33] Lin, W.Z.; Xiao, X.; Chou, K.C., GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis, Protein engineering, design and selection, 22, 699-705, (2009)
[34] Lin, H., Li, Q.Z. 2011. Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory of Bioscience: Springer-Verlag 10.1007/s12064-010-0114-8.
[35] Mares, R.; Urbanowski, M.L.; Stauffer, G.V., Regulation of the salmonella typhimurium meta gene by the metr protein and homocysteine, Journal of bacteriology, 17, 2, 390-397, (1992)
[36] Maxson, M.E.; Darwin, A.J., Multiple promoters control expression of the yersinia enterocolitica phage-shock-protein A (pspa) operon, Microbiology, 152, 4, 1001-1010, (2006)
[37] Mohabatkar, H., Prediction of cyclin proteins using Chou’s pseudo amino acid composition, Protein & peptide letters, 17, 1207-1214, (2010)
[38] Nanni, L.; Lumini, A., A further step toward an optimal ensemble of classifiers for peptide classification, a case study: HIV protease, Protein & peptide letters, 16, 163-167, (2009)
[39] E.C. Oppon, 2000. Synergistic use of promoter prediction algorithms: a choice for a small training dataset? 238 f. Doctorate in Computational Science—South African National Bioinformatics Institute (SANBI).
[40] Polate, K.; Günes, S., A novel approach to estimation of E. coli promoter gene sequences: combining feature selection and least square support vector machine (FS_LSSVN), Applied mathematics and computation, 190, 1574-1582, (2007) · Zbl 1117.92029
[41] Penfound, T.; Foster, J.W., NAD-dependent DNA-binding activity of the bifunctional nadr regulator of salmonella typhimurium, Journal of bacteriology, 181, 2, 648-655, (1999)
[42] Perez, J.C.; Groisman, E.A., Transcription factor function and promoter architecture govern the evolution of bacterial regulons, Pnas, 106, 11, 4319-4324, (2009)
[43] Rani, T.S.; Bhavani, S.D.; Bapi, R.S., Analysis of E. coli promoter recognition problem in dinucleotide feature space, Bioinformatics, 23, 5, 582-588, (2007)
[44] Ramírez-Santos, J.; Collado-Vides, J.; García-Varela, M.; Gómez-Eichelmann, M., Conserved regulatory elements of the promote sequence of the gene rpoh of enteric bacteria, Nucleic acids research, 29, 2, 380-386, (2001)
[45] R Development Core Team, R: A language and environment for statistical computing, (2008), R Foundation for Statistical Computing Vienna, Austria
[46] Rangannan, V.; Bansal, M., Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability, Journal of biosciences, 32, 5, 851-862, (2007)
[47] Smith, H.Q.; Somerville, R.L., The tpl promoter of citrobacter freundii is activated by the tyrr protein, Journal of bacteriology, 179, 18, 5914-5921, (1997)
[48] Sulavik, M.C.; Dazer, M.; Miller, P.F., The salmonella typhimurium mar locus: molecular and genetic analyses and assessment of its role in virulence, Journal of bacteriology, 179, 6, 1857-1866, (1997)
[49] Skovierova, H.; Rowley, G.; Rezuchova, B.; Homerova, D.; Lewis, C.; Roberts, M.; Kormanec, J., Identification of the σE regulon of salmonella enterica serovar typhimurium, Microbiology, 152, 1347-1359, (2006)
[50] Song, W.; Maiste, P.J.; Naiman, D.Q.; Ward, M.J., Sigma 28 promoter prediction in members of the gammaproteobacteria, Federation of European microbiological societies, 271, 222-229, (2007)
[51] Shultzaberger, R.K.; Chen, Z.; Lewis, K.A.; Schneider, T.D., Anatomy of Escherichia coli σ70 promoters, Nucleic acids research, 35, 3, 771-788, (2007)
[52] Tobe, T.; Yoshikawa, M.; Mizuno, T.; Sasakawa, C., Transcriptional control of the invasion regulatory gene virb of shigella flexneri: activation by virf and repression by H-NS, Journal of bacteriology, 175, 19, 6142-6149, (1993)
[53] Typas, A.; Becker, G.; Hengge, R., The molecular basis of selective promoter activation by the σS subunit of RNA polymerase, Molecular microbiology, 63, 1296-1306, (2007)
[54] Wösten, M.M.S.M.; Groisman, E.A., Molecular characterization of the pmra regulon, The journal of biological chemistry, 274, 38, 27185-27190, (1999)
[55] Wang, Y.; deHaseth, P.L., Sigma 32-dependent promoter activity in vivo: sequence determinants of the groe promoter, Journal of bacteriology, 185, 19, 5080-5086, (2003)
[56] Xiao, X.; Wang, P.; Chou, K.C., Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image, Journal of theoretical biology, 254, 691-696, (2008) · Zbl 1400.92416
[57] Xiao, X.; Wang, P.; Chou, K.C., GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes, Journal of computational chemistry, 30, 1414-1423, (2009)
[58] Xiao, X.; Wang, P.; Chou, K.C., GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions, Molecular biosystems, 7, 911-919, (2011)
[59] Xiao, X.; Wang, P.; Chou, K.C., Quat-2L: a web-server for predicting protein quaternary structural attributes, Molecular diversity, 15, 1, 149-155, (2011)
[60] Xu, Y.; Wang, X.-B.; Ding, J.; Wuc, L.-Y.; Deng, N.-Y., Lysine acetylation sites prediction using an ensemble of support vector machine classifiers, Journal of theoretical biology, 264, 130-135, (2010) · Zbl 1406.92223
[61] Yang, J.; Hart, E.; Tauschek, M.; Price, G.D.; Hartland, E.L.; Strugnell, R.A.; Robins-Browne, R.A., Bicarbonate-mediated transcriptional activation of divergent operons by the virulence regulatory protein, rega, from citrobacter rodentium, Molecular microbiology, 2, 314-327, (2008)
[62] Zakeri, P.; Moshiri, B.; Sadeghi, M., Prediction of protein submitochondria locations based on data fusion of various features of sequences, Journal of theoretical biology, 269, 208-216, (2011) · Zbl 1307.92094
[63] Zeng, Y.H.; Guo, Y.Z.; Xiao, R.Q.; Yang, L.; Yu, L.Z.; Li, M.L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, Journal of theoretical biology, 259, 366-372, (2009) · Zbl 1402.92193
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.