×

zbMATH — the first resource for mathematics

Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae. (English) Zbl 1307.92082
Summary: Meiotic recombination does not occur randomly across the genome, but instead occurs at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Hotspots and coldspots would shed light on the mechanism of recombination, but the accurate prediction of hot/cold spots is still an open question. In this study, we presented a model to predict hot/cold spots in yeast using increment of diversity combined with quadratic discriminant analysis (IDQD) based on sequence \(k\)-mer frequencies. 5-fold cross validation showed a total prediction accuracy of 80.3%. Compared with other machine-learning algorithms, IDQD approach is as powerful as random forest (RF) and outperforms support vector machine (SVM) in identifying hotspots and coldspots. We also predicted increased recombination rates in the upstream regions of transcription start sites and in the downstream regions of transcription termination sites. Additionally, genome-wide recombination map in yeast obtained by IDQD model is in close agreement with the experimentally generated map, especially for the Peak locations, although some fine-scale differences exist. Our results highlight the sequence dependency of recombination.

MSC:
92C40 Biochemistry, molecular biology
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Bartolome, C.; Maside, X.; Charlesworth, B., On the abundance and distribution of transposable elements in the genome of drosophila melanogaster, Mol. biol. evol., 19, 926-937, (2002)
[2] Baudat, F.; Buard, J.; Grey, C., PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice, Science, 327, 836-840, (2010)
[3] Birdsell, J.A., Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution, Mol. biol. evol., 19, 1181-1197, (2002)
[4] Cai, Y.D.; Liu, X.J.; Chou, K.C., Artificial neural network model for predicting membrane protein types, J. biomol. struct. dyn., 18, 607-610, (2001)
[5] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine, Protein pept. lett., 16, 27-31, (2009)
[6] Chou, K.C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. theor. biol., 273, 236-247, (2011) · Zbl 1405.92212
[7] Chou, K.C.; Maggiora, G.M., Domain structural class prediction, Protein eng., 11, 523-538, (1998)
[8] Chou, K.C.; Shen, H.B., Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers, J. proteome res., 5, 1888-1897, (2006)
[9] Chou, K.C.; Shen, H.B., Review: recent progresses in protein subcellular location prediction, Anal. biochem., 370, 1-16, (2007)
[10] Chou, K.C.; Shen, H.B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. sci., 2, 63-92, (2009)
[11] Chou, K.C.; Shen, H.B., Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization, Plos one, 5, e11335, (2010)
[12] Chou, K.C.; Zhang, C.T., Predicting protein folding types by distance functions that make allowances for amino acid interactions, J. biol. chem., 269, 22014-22020, (1994)
[13] Chou, K.C.; Zhang, C.T., Review: prediction of protein structural classes, Crit. rev. biochem. mol. biol., 30, 275-349, (1995)
[14] Chou, K.C.; Wu, Z.C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, Plos one, 6, e18258, (2011)
[15] Comeron, J.M.; Kreitman, M., The correlation between intron length and recombination in drosophila: dynamic equilibrium between mutational and selective forces, Genetics, 156, 1175-1190, (2000)
[16] Ding, H.; Luo, L.; Lin, H., Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition, Protein pept. lett., 16, 351-355, (2009)
[17] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. theor. biol., 263, 203-209, (2010)
[18] Galtier, N.; Piganeau, G.; Mouchiroud, D.; Duret, L., GC-content evolution in Mammalian genomes: the biased gene conversion hypothesis, Genetics, 159, 907-911, (2001)
[19] Gerton, J.L.; DeRisi, J.; Shroff, R., Global mapping of meiotic recombination hotspots and coldspots in the yeast saccharomyces cerevisiae, Proc. natl. acad. sci. USA, 97, 11383-11390, (2000)
[20] Getun, I.V.; Wu, Z.K.; Khalil, A.M.; Bois, P.R., Nucleosome occupancy landscape and dynamics at mouse recombination hotspots, EMBO rep., 11, 555-560, (2010)
[21] Gu, Q.; Ding, Y.S.; Zhang, T.L., Prediction of G-protein-coupled receptor classes in low homology using Chou’s pseudo amino acid composition with approximate entropy and hydrophobicity patterns, Protein pept. lett., 17, 559-567, (2010)
[22] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. theor. biol., 271, 10-17, (2011) · Zbl 1405.92217
[23] Jensen-Seaman, M.I.; Furey, T.S.; Payseur, B.A.; Lu, Y.T.; Roskin, K.M.; Chen, C.F.; Thomas, M.A.; Haussler, D.; J-Jacob, H., Comparative recombination rates in the rat, mouse, and human genomes, Genome res., 14, 528-538, (2004)
[24] Jiang, P.; Wu, H.; Wei, J.; Sang, F.; Sun, X.; Lu, Z., RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features, Nucleic acids res., 35, W47-W51, (2007)
[25] Kandaswamy, K.K.; Chou, K.C.; Martinetz, T., AFP-pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties, J. theor. biol., 270, 56-62, (2011)
[26] Keeney, S.; Neale, M.J., Initiation of meiotic recombination by formation of DNA double-strand breaks: mechanism and regulation, Biochem. soc. trans., 34, 523-525, (2006)
[27] Laxton, R.R., The measure of diversity, J. theor. biol., 71, 51-67, (1978)
[28] Lercher, M.J.; Hurst, L.D., Human SNP variability and mutation rate are higher in regions of high recombination, Trends genet., 18, 337-340, (2002)
[29] Lewin, B., Genes VIII, (2004), Pearson Prentice Hall Upper Saddle River, NJ
[30] Li, Q.Z.; Lu, Z.Q., The prediction of the structural class of protein: application of the measure of diversity, J. theor. biol., 213, 493-502, (2001)
[31] Lin, H.; Li, Q.Z., Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant, Biochem. biophys. res. commun., 354, 548-551, (2007)
[32] Lin, W.Z.; Xiao, X.; Chou, K.C., GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis, Protein eng. des. sel., 22, 699-705, (2009)
[33] Liu, G.; Li, H., The correlation between recombination rate and dinucleotide bias in drosophila melanogaster, J. mol. evol., 67, 358-367, (2008)
[34] Liu, G.; Li, H.; Cai, L., Processed pseudogenes are located preferentially in regions of low recombination rates in the human genome, J. evol. biol., 23, 1107-1115, (2010)
[35] Liu, W.; Chou, K.C., Prediction of protein structural classes by modified Mahalanobis discriminant algorithm, J. protein chem., 17, 209-217, (1998)
[36] Lobachev, K.S.; Shor, B.M.; Tran, H.T.; Taylor, W.; Keen, J.D.; Resnick, M.A.; Gordenin, D.A., Factors affecting inverted repeat stimulation of recombination and deletion in saccharomyces cerevisiae, Genetics, 148, 1507-1524, (1998)
[37] Lu, J.; Luo, L.F.; Zhang, L.R.; Chen, W.; Zhang, Y., Increment of diversity with quadratic discriminant analysis – an efficient tool for sequence pattern recognition in bioinformatics, Open access bioinf., 2, 89-96, (2010)
[38] Lynn, A.; Ashley, T.; Hassold, T., Variation in human meiotic recombination, Annu. rev. genomics hum. genet., 5, 317-349, (2004)
[39] Mancera, E.; Bourgon, R.; Brozzi, A.; Huber, W.; Steinmetz, L.M., High-resolution mapping of meiotic crossovers and non-crossovers in yeast, Nature, 454, 477-485, (2008)
[40] Masso, M.; Vaisman, I.I, Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms, J. theor. biol., 266, 560-568, (2010)
[41] McVean, G.; Spencer, C.C., The influence of recombination on human genetic diversity, Plos genet., 2, e148, (2006)
[42] Meunier, J.; Duret, L., Recombination drives the evolution of GC-content in the human genome, Mol. biol. evol., 21, 984-990, (2004)
[43] Mohabatkar, H., Prediction of cyclin proteins using Chou’s pseudo amino acid composition, Protein pept. lett., 17, 1207-1214, (2010)
[44] Myers, S.; Bottolo, L.; Freeman, C., A fine-scale map of recombination rates and hotspots across the human genome, Science, 310, 321-324, (2005)
[45] Myers, S.; Freeman, C.; Auton, A., A common sequence motif associated with recombination hot spots and genome instability in humans, Nat. genet., 40, 1124-1129, (2008)
[46] Myers, S.; Bowden, R.; Tumian, A., Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination, Science, 327, 876-879, (2010)
[47] Nasar, F.; Jankowski, C.; Nag, D.K., Long palindromic sequences induce double-strand breaks during meiosis in yeast, Mol. cell. biol., 20, 3449-3458, (2000)
[48] Osuna, E.; Freund, R.; Girosi, F., An improved training algorithm for support vector machines, Proc. IEEE neural networks signal process. (NNSP), 97, 276-285, (1997)
[49] Parvanov, E.D.; Petkov, P.M.; Paigen, K., PRDM9 controls activation of Mammalian recombination hotspots, Science, 327, 835, (2010)
[50] Presgraves, D.C., Recombination enhances protein adaptation in drosophila melanogaster, Curr. biol., 15, 1651-1656, (2005)
[51] Singh, N.D.; Davis, J.C.; Petrov, D.A., Codon bias and non-coding GC content correlate negatively with recombination rate on the drosophila X chromosome, J. mol. evol., 61, 315-324, (2005)
[52] Tsai, I.J.; Burt, A.; Koufopanou, V., Conservation of recombination hotspots in yeast, Proc. natl. acad. sci. USA, 107, 7847-7852, (2010)
[53] Xiao, X.; Chou, K.C., Using pseudo amino acid composition to predict protein attributes via cellular automata and others approaches, Curr. bioinf., 6, 251-260, (2011)
[54] Xiao, X.; Wang, P.; Chou, K.C., GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes, J. comput. chem., 30, 1414-1423, (2009)
[55] Xiao, X.; Wu, Z.C.; Chou, K.C., A multi-label classifier for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple sites, Plos one, 6, e20592, (2011)
[56] Xiao, X.; Wang, P.; Chou, K.C., GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions, Mol. biosyst., 7, 911-919, (2011)
[57] Xiao, X.; Wang, P.; Chou, K.C., Quat-2L: a web-server for predicting protein quaternary structural attributes, Mol. diversity, 15, 149-155, (2011)
[58] Xiao, X.; Wu, Z.C.; Chou, K.C., Iloc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, J. theor. biol., 284, 42-51, (2011)
[59] Zhang, L.R.; Luo, L.F., Splice site prediction with quadratic discriminant analysis using diversity measure, Nucleic acids res., 31, 6214-6220, (2003)
[60] Zhang, M.Q., Identification of protein coding regions in the human genome by quadratic discriminant analysis, Proc. natl. acad. sci. USA, 94, 565-568, (1997)
[61] Zhou, G.P., An intriguing controversy over protein structural class prediction, J. protein chem., 17, 729-738, (1998)
[62] Zhou, G.P.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins: struct. funct. genet., 50, 44-48, (2003)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.