On application of directons to functional classification of genes in prokaryotes. (English) Zbl 1403.92159

Summary: Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene functional annotations. We have developed a method to integrate genomic neighborhood information of genes with their sequence similarity information for the functional classification of prokaryotic genes. The application of our method to 93 proteobacterial genomes has shown that (i) the genomic neighborhoods are much more conserved across prokaryotic genomes than expected by chance, and such conservation can be utilized to improve functional classification of genes; (ii) while our method is consistent with the existing popular schemes as much as they are among themselves, it does provide functional classification at higher resolution and hence allows functional assignments of (new) genes at a more specific level; and (iii) our method is fairly stable when being applied to different genomes.


92D10 Genetics and epigenetics
62P10 Applications of statistics to biology and medical sciences; meta analysis


BranchClust; Pfam; KEGG
Full Text: DOI


[1] Altschul, S.F.; Gish, W., Basic local alignment search tool, J. mol. biol., 215, 3, 403-410, (1990)
[2] Ashburner, M.; Ball, C.A., Gene ontology: tool for the unification of biology. the gene ontology consortium, Nat. genet., 25, 1, 25-29, (2000)
[3] Brohee, S.; van Helden, J., Evaluation of clustering algorithms for protein-protein interaction networks, BMC bioinformatics, 7, 488, (2006)
[4] Che, D.; Li, G., Detecting uber-operons in prokaryotic genomes, Nucleic acids res., 34, 8, 2418-2427, (2006)
[5] Chen, X.; Su, Z., Operon prediction by comparative genomics: an application to the synechococcus sp. WH8102 genome, Nucleic acids res., 32, 7, 2147-2157, (2004)
[6] Chiu, J.C.; Lee, E.K., Orthologid: automation of genome-scale ortholog identification within a parsimony framework, Bioinformatics, 22, 6, 699-707, (2006)
[7] Dehal, P.S.; Boore, J.L., A phylogenomic gene cluster resource: the phylogenetically inferred groups (phigs) database, BMC bioinformatics, 7, 201, (2006)
[8] Doerks, T.; von Mering, C., Functional clues for hypothetical proteins based on genomic context analysis in prokaryotes, Nucleic acids res., 32, 21, 6321-6326, (2004)
[9] Dongen, S.v., Graph clustering by flow simulation, (2000), University of Utrecht
[10] Dufayard, J.F.; Duret, L., Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases, Bioinformatics, 21, 11, 2596-2603, (2005)
[11] Durell, S.R.; Hao, Y., Evolutionary relationship between K(+) channels and symporters, Biophys. J., 77, 2, 775-788, (1999)
[12] Enright, A.J.; Van Dongen, S., An efficient algorithm for large-scale detection of protein families, Nucleic acids res., 30, 7, 1575-1584, (2002)
[13] Epstein, W., The roles and regulation of potassium in bacteria, Prog. nucleic acid res. mol. biol., 75, 293-320, (2003)
[14] Ermolaeva, M.D.; White, O., Prediction of operons in microbial genomes, Nucleic acids res., 29, 5, 1216-1221, (2001)
[15] Fulton, D.L.; Li, Y.Y., Improving the specificity of high-throughput ortholog prediction, BMC bioinformatics, 7, 270, (2006)
[16] Hirsh, A.E.; Fraser, H.B., Protein dispensability and rate of evolution, Nature, 411, 6841, 1046-1049, (2001)
[17] Hoenke, S.; Schmid, M., Sequence of a gene cluster from klebsiella pneumoniae encoding malonate decarboxylase and expression of the enzyme in Escherichia coli, Eur. J. biochem., 246, 2, 530-538, (1997)
[18] Hulo, N.; Bairoch, A., The PROSITE database, Nucleic acids res., 34, Database issue, D227-D230, (2006)
[19] Huynen, M.; Snel, B., Exploitation of gene context, Curr. opin. struct. biol., 10, 3, 366-370, (2000)
[20] Janga, S.C.; Collado-Vides, J., Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons, Nucleic acids res., 33, 8, 2521-2530, (2005)
[21] Kanehisa, M., The KEGG database, Novartis found symp, 247, 91-101, (2002), (discussion 101-3, 119-28, 244-52)
[22] Korbel, J.O.; Jensen, L.J., Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs, Nat. biotechnol., 22, 7, 911-917, (2004)
[23] Mao, F.; Su, Z., Mapping of orthologous genes in the context of biological pathways: an application of integer programming, Proc. natl. acad. sci. U.S.A., 103, 1, 129-134, (2006)
[24] Moreno-Hagelsieb, G.; Collado-Vides, J., Operon conservation from the point of view of Escherichia coli, and inference of functional interdependence of gene products from genome context, In silico biol., 2, 2, 87-95, (2002)
[25] Overbeek, R.; Begley, T., The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes, Nucleic acids res., 33, 17, 5691-5702, (2005)
[26] Overbeek, R.; Fonstein, M., Use of contiguity on the chromosome to predict functional coupling, In silico biol., 1, 2, 93-108, (1999)
[27] Pietrokovski, S.; Henikoff, J.G., The blocks database – a system for protein classification, Nucleic acids res., 24, 1, 197-200, (1996)
[28] Poptsova, M.S.; Gogarten, J.P., Branchclust: a phylogenetic algorithm for selecting gene families, BMC bioinformatics, 8, 1, 120, (2007)
[29] Sackett, D., Evidence-based medicine: how to practice and teach EBM, (2000), Churchill Livingstone
[30] Sonnhammer, E.L.; Eddy, S.R., Pfam: multiple sequence alignments and HMM-profiles of protein domains, Nucleic acids res., 26, 1, 320-322, (1998)
[31] Tatusov, R.L.; Galperin, M.Y., The COG database: a tool for genome-scale analysis of protein functions and evolution, Nucleic acids res., 28, 1, 33-36, (2000)
[32] Tatusov, R.L.; Koonin, E.V., A genomic perspective on protein families, Science, 278, 5338, 631-637, (1997)
[33] Wall, D.P.; Fraser, H.B., Detecting putative orthologs, Bioinformatics, 19, 13, 1710-1711, (2003)
[34] Wolf, Y.I.; Rogozin, I.B., Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context, Genome res., 11, 3, 356-372, (2001)
[35] Wu, H.; Mao, F., Accurate prediction of orthologous gene groups in microbes, Proc. IEEE comput. syst. bioinform. conf., 73-79, (2005)
[36] Wu, H.; Mao, F., Hierarchical classification of functionally equivalent genes in prokaryotes, Nucleic acids res., 37, 7, 2125-2140, (2007)
[37] Wu, H.; Su, Z., Prediction of functional modules based on comparative genome analysis and gene ontology application, Nucleic acids res., 33, 9, 2822-2837, (2005)
[38] Zhao, J.; Che, D.; Cai, L., Comparative pathway annotation with protein-DNA interaction and operon information via graph tree decomposition, ()
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.