×

Error correction in methylation profiling from NGS bisulfite protocols. (English) Zbl 1460.92154

Elloumi, Mourad (ed.), Algorithms for next-generation sequencing data. Techniques, approaches, and applications. Cham: Springer. 167-183 (2017).
Summary: Whole genome bisulfite sequencing (WGBS) has emerged as the primary technique for DNA methylation studies, because of its great potential in terms of speed, specificity, and the capability of addressing new biological implications as non-CpG context methylation or hemimethylation. However, despite the improvement that has meant the appearance of WGBS, processing and analyzing the resulting datasets is not as straightforward as in other methylation assays, and special care should be taken to obtain reliable results. As far as we know, an extensive review on the error sources that can bias methylation level measurement and the different algorithms that have been proposed to deal with it does not exist. Therefore, in this chapter all known WGBS error sources will be extensively reviewed and critically evaluated in order to suggest a couple of best practices to deal with all sources of bias in WGBS assays.
For the entire collection see [Zbl 1383.68005].

MSC:

92D20 Protein sequences, DNA sequences
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bonasio, R., Tu, S., Reinberg, D.: Molecular signals of epigenetic states. Science 330(6004), 612-616 (2010)
[2] Lister, R., Ecker, J.R.: Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res. 19(6), 959-966 (2009)
[3] Jones, P.A.: Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13(7), 484-492 (2012)
[4] Hotchkiss, R.D.: The quantitative separation of purines, pyrimidines, and nucleosides by paper chromatography. J. Biol. Chem. 175(1), 315-332 (1948)
[5] Riggs, A.D.: X inactivation, differentiation, and DNA methylation. Cytogenet. Cell Genet. 14(1), 9-25 (1975)
[6] Holliday, R., Pugh, J.E.: DNA modification mechanisms and gene activity during development. Science 187(4173), 226-232 (1975)
[7] Laird, P.W.: Principles and challenges of genomewide DNA methylation analysis. Nat. Rev. Genet. 11(3), 191-203 (2010)
[8] Frommer, M., McDonald, L.E., Millar, D.S., Collis, C.M., Watt, F., Grigg, G.W., Molloy, P.L., Paul, C.L.: A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc. Natl. Acad. Sci. U. S. A. 89(5), 1827-1831 (1992)
[9] Xi, Y., Li, W.: Bsmap: whole genome bisulfite sequence mapping program. BMC Bioinf. 10, 232 (2009)
[10] Chen, P.Y., Cokus, S.J., Pellegrini, M.: Bs seeker: precise mapping for bisulfite sequencing. BMC Bioinf. 11, 203 (2010)
[11] Guo, W., Fiziev, P., Yan, W., Cokus, S., Sun, X., Zhang, M.Q., Chen, P.Y., Pellegrini, M.: Bs-seeker2: a versatile aligning pipeline for bisulfite sequencing data. BMC Genomics 14, 774 (2013)
[12] Hach, F., Hormozdiari, F., Alkan, C., Hormozdiari, F., Birol, I., Eichler, E.E., Sahinalp, S.C.: mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7(8), 576-577 (2010)
[13] Krueger, F., Andrews, S.R.: Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27(11), 1571-1572 (2011)
[14] Pedersen, B., Hsieh, T.F., Ibarra, C., Fischer, R.L.: Methylcoder: software pipeline for bisulfite-treated sequences. Bioinformatics 27(17), 2435-2436 (2011)
[15] Hackenberg, M., Barturen, G., Oliver, J.L.: In: Tatarinova, T. (ed.) DNA Methylation Profiling from High-Throughput Sequencing Data, DNA Methylation - From Genomics to Technology, InTech (2012). doi:10.5772/34825
[16] Chatterjee, A., Stockwell, P.A., Rodger, E.J., Morison, I.M.: Comparison of alignment software for genome-wide bisulphite sequence data. Nucleic Acids Res. 40(10), e79 (2012)
[17] Frith, M.C., Mori, R., Asai, K.: A mostly traditional approach improves alignment of bisulfite-converted DNA. Nucleic Acids Res. 40(13), e100 (2012)
[18] Kunde-Ramamoorthy, G., Coarfa, C., Laritsky, E., Kessler, N.J., Harris, R.A., Xu, M., Chen, R., Shen, L., Milosavljevic, A., Waterland, R.A.: Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res. 42(6), e43 (2014)
[19] Schultz, M.D., Schmitz, R.J., Ecker, J.R.: ‘leveling’ the playing field for analyses of single-base resolution DNA methylomes. Trends Genet. 28(12), 583-585 (2012)
[20] Beck, S., Rakyan, V.K.: The methylome: approaches for global DNA methylation profiling. Trends Genet. 24(5), 231-237 (2008)
[21] Krueger, F., Kreck, B., Franke, A., Andrews, S.R.: DNA methylome analysis using short bisulfite sequencing data. Nat. Methods 9(2), 145-151 (2012)
[22] Cokus, S.J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C.D., Pradhan, S., Nelson, S.F., Pellegrini, M., Jacobsen, S.E.: Shotgun bisulphite sequencing of the arabidopsis genome reveals DNA methylation patterning. Nature 452(7184), 215-219 (2008)
[23] Meissner, A., Gnirke, A., Bell, G.W., Ramsahoye, B., Lander, E.S., Jaenisch, R.: Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res. 33(18), 5868-5877 (2005)
[24] Hansen, K.D., Langmead, B., Irizarry, R.A.: Bsmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 13(10), R83 (2012)
[25] Andrews, S.: FastQC: a quality control application for fastq data (2010). Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
[26] Hannon: Fastx-toolkit (2009)
[27] Martin, M.: Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17(1), 10-12 (2011)
[28] Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15), 2114-2120 (2014)
[29] Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38(12), e131 (2010)
[30] Schwartz, S., Oren, R., Ast, G.: Detection and removal of biases in the analysis of next-generation sequencing reads. PLoS One 6(1), e16685 (2011)
[31] Poptsova, M.S., Il’icheva, I.A., Nechipurenko, D.Y., Panchenko, L.A., Khodikov, M.V., Oparina, N.Y., Polozov, R.V., Nechipurenko, Y.D., Grokhovsky, S.L.: Non-random DNA fragmentation in next-generation sequencing. Sci. Rep. 4, 4532 (2014)
[32] Aird, D., Ross, M.G., Chen, W.S., Danielsson, M., Fennell, T., Russ, C., Jaffe, D.B., Nusbaum, C., Gnirke, A.: Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12(2), R18 (2011)
[33] Benjamini, Y., Speed, T.P.: Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40(10), e72 (2012)
[34] Miura, F., Enomoto, Y., Dairiki, R., Ito, T.: Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging. Nucleic Acids Res. 40(17), e136 (2012)
[35] Ziller, M.J., Hansen, K.D., Meissner, A., Aryee, M.J.: Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nat. Methods 12(3), 230-232 (2015)
[36] Kozarewa, I., Ning, Z., Quail, M.A., Sanders, M.J., Berriman, M., Turner, D.J.: Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (g+c)-biased genomes. Nat. Methods 6(4), 291-295 (2009)
[37] Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., Subgroup Genome Project Data Processing: The sequence alignment/map format and samtools. Bioinformatics 25(16), 2078-2079 (2009)
[38] Broad-Institute: A set of tools for working with next generation sequencing data in the BAM. Available online at: http://broadinstitute.github.io/picard/
[39] Barturen, G., Rueda, A., Oliver, J.L., Hackenberg, M.: MethylExtract: high-quality methylation maps and SNV calling from whole genome bisulfite sequencing data. F1000Res 2, 217 (2013)
[40] Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L., Rice, P.M.: The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38(6), 1767-1771 (2010)
[41] James Kent, W., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: The human genome browser at UCSC. Genome Res. 12(6), 996-1006 (2002)
[42] Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)
[43] Li, H.: Improving SNP discovery by base alignment quality. Bioinformatics 27(8), 1157-1158 (2011)
[44] Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357-359 (2012)
[45] Fuller, C.W., Middendorf, L.R., Benner, S.A., Church, G.M., Harris, T., Huang, X., Jovanovich, S.B., Nelson, J.R., Schloss, J.A., Schwartz, D.C., Vezenov, D.V.: The challenges of sequencing by synthesis. Nat. Biotechnol. 27(11), 1013-1023 (2009)
[46] Taub, M.A., Corrada Bravo, H., Irizarry, R.A.: Overcoming bias and systematic errors in next generation sequencing data. Genome Med. 2(12), 87 (2010)
[47] Del Fabbro, C., Scalabrin, S., Morgante, M., Giorgi, F.M.: An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 8(12), e85024 (2013)
[48] Minoche, A.E., Dohm, J.C., Himmelbauer, H.: Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12(11), R112 (2011)
[49] Liu, Y., Siegmund, K.D., Laird, P.W., Berman, B.P.: Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biol. 13(7), R61 (2012)
[50] DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., Daly, M.J.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491-498 (2011)
[51] Lister, R., Pelizzola, M., Dowen, R.H., Hawkins, R.D., Hon, G., Tonti-Filippini, J., Nery, J.R., Lee, L., Ye, Z., Ngo, Q.M., Edsall, L., Antosiewicz-Bourget, J., Stewart, R., Ruotti, V., Millar, A.H., Thomson, J.A., Ren, B., Ecker, J.R.: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462(7271), 315-322 (2009)
[52] Hon, G.C., Hawkins, R.D., Caballero, O.L., Lo, C., Lister, R., Pelizzola, M., Valsesia, A., Ye, Z., Kuan, S., Edsall, L.E., et al.: Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genet. Res. 22(2), 246-258 (2012)
[53] Ziller, M.J., Gu, H., Muller, F., Donaghey, J., Tsai, L.T., Kohlbacher, O., De Jager, P.L., Rosen, E.D., Bennett, D.A., Bernstein, B.E., Gnirke, A., Meissner, A.: Charting a dynamic DNA methylation landscape of the human genome. Nature 500(7463), 477-481 (2013)
[54] Lin, X., Sun, D., Rodriguez, B., Zhao, Q., Sun, H., Zhang, Y., Li, W.: Bseqc: quality control of bisulfite sequencing experiments. Bioinformatics 29(24), 3227-3229 (2013)
[55] Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin,K.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29(1), 308-311 (2001)
[56] Consortium Genomes Project, Abecasis, G.R., Altshuler, D., Auton, A., Brooks, L.D., Durbin, R.M., Gibbs, R.A., Hurles, M.E., McVean, G.A.: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061-1073 (2010)
[57] Weisenberger, D.J., Campan, M., Long, T.I., Kim, M., Woods, C., Fiala, E., Ehrlich, M., Laird, P.W.: Analysis of repetitive element DNA methylation by methylight. Nucleic Acids Res. 33(21), 6823-6836 (2005)
[58] McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9), 1297-1303 (2010)
[59] Koboldt, D.C., Chen, K., Wylie, T., Larson, D.E., McLellan, M.D., Mardis, E.R., Weinstock, G.M., Wilson, R.K., Ding, L.: Varscan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25(17), 2283-2285 (2009)
[60] Seisenberger, S., Andrews, S., Krueger, F., Arand, J., Walter, J., Santos, F., Popp, C., Thienpont, B., Dean, W., Reik, W.: The dynamics of genome-wide DNA methylation reprogramming in mouse primordial germ cells. Mol. Cell 48(6), 849-862 (2012)
[61] Iacobazzi, V., Castegna, A., Infantino, V., Andria, G.: Mitochondrial DNA methylation as a next-generation biomarker and diagnostic tool. Mol. Genet. Metab. 110(1-2), 25-34 (2013)
[62] Guo, J.U., Su, Y., Shin, J.H., Shin, J., Li, H., Xie, B., Zhong, C., Hu, S., Le, T., Fan, G., Zhu, H., Chang, Q., Gao, Y., Ming, G.L., Song, H.: Distribution, recognition and regulation of non-CpG methylation in the adult mammalian brain. Nat. Neurosci. 17(2), 215-222 (2014)
[63] Guo, W., Chung, W.Y., Qian, M., Pellegrini, M., Zhang, M.Q.: Characterizing the strand-specific distribution of non-CpG methylation in human pluripotent cells. Nucleic Acids Res. 42(5), 3009-3016 (2014)
[64] Stadler, M.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.