Wang, Y. X. Rachel; Sarkar, Purnamrita; Ursu, Oana; Kundaje, Anshul; Bickel, Peter J. Network modelling of topological domains using Hi-C data. (English) Zbl 1433.62318 Ann. Appl. Stat. 13, No. 3, 1511-1536 (2019). Summary: Chromosome conformation capture experiments such as Hi-C are used to map the three-dimensional spatial organization of genomes. One specific feature of the 3D organization is known as topologically associating domains (TADs), which are densely interacting, contiguous chromatin regions playing important roles in regulating gene expression. A few algorithms have been proposed to detect TADs. In particular, the structure of Hi-C data naturally inspires application of community detection methods. However, one of the drawbacks of community detection is that most methods take exchangeability of the nodes in the network for granted; whereas the nodes in this case, that is, the positions on the chromosomes, are not exchangeable. We propose a network model for detecting TADs using Hi-C data that takes into account this nonexchangeability. In addition, our model explicitly makes use of cell-type specific CTCF binding sites as biological covariates and can be used to identify conserved TADs across multiple cell types. The model leads to a likelihood objective that can be efficiently optimized via relaxation. We also prove that when suitably initialized, this model finds the underlying TAD structure with high probability. Using simulated data, we show the advantages of our method and the caveats of popular community detection methods, such as spectral clustering, in this application. Applying our method to real Hi-C data, we demonstrate the domains identified have desirable epigenetic features and compare them across different cell types. Cited in 2 Documents MSC: 62P10 Applications of statistics to biology and medical sciences; meta analysis 62M45 Neural nets and related approaches to inference from stochastic processes 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62R40 Topological data analysis 92D10 Genetics and epigenetics Keywords:Hi-C data; topologically associating domains; network models; community detection Software:HiFive; MrTADFinder; TADtree PDF BibTeX XML Cite \textit{Y. X. R. Wang} et al., Ann. Appl. Stat. 13, No. 3, 1511--1536 (2019; Zbl 1433.62318) Full Text: DOI arXiv Euclid References: [1] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068-21073. · Zbl 1359.62411 [2] Cabreros, I., Abbe, E. and Tsirigos, A. (2016). Detecting community structures in hi-c genomic data. In Information Science and Systems (CISS), 2016 Annual Conference on 584-589. IEEE Press, New York. [3] Dekker, J. (2008). Gene regulation in the third dimension. Science 319 1793-1794. [4] Dixon, J. R. et al. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485 376-380. [5] ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489 57-74. [6] Filippova, D., Patro, R., Duggal, G. and Kingsford, C. (2014). Identification of alternative topological domains in chromatin. Algorithms Mol. Biol. 9 14. [7] Hou, C., Li, L., Qin, Z. S. and Corces, V. G. (2012). Gene density, transcription, and insulators contribute to the partition of the drosophila genome into physical domains. Molecular Cell 48 471-484. [8] Kellis, M. et al. (2014). Defining functional dna elements in the human genome. Proc. Natl. Acad. Sci. USA 111 6131-6138. [9] Knight, P. A. and Ruiz, D. (2013). A fast algorithm for matrix balancing. IMA J. Numer. Anal. 33 1029-1047. · Zbl 1276.65025 [10] Le Dily, F. et al. (2014). Distinct structural transitions of chromatin topological domains correlate with coordinated hormone-induced gene regulation. Genes & Development 28 2151-2162. [11] Lévy-Leduc, C., Delattre, M., Mary-Huard, T. and Robin, S. (2014). Two-dimensional segmentation for analyzing hi-c data. Bioinformatics 30 i386-i392. [12] Lieberman-Aiden, E. et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326 289-293. [13] Lupiáñez, D. G. et al. (2015). Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161 1012-1025. [14] Malik, L. I. and Patro, R. (2015). Rich chromatin structure prediction from hi-c data. bioRxiv, page 032953. [15] Meaburn, K. J., Gudla, P. R., Khan, S., Lockett, S. J. and Misteli, T. (2009). Disease-specific gene repositioning in breast cancer. J. Cell Biol. 187 801-812. [16] Nora, E. P. et al. (2012). Spatial partitioning of the regulatory landscape of the x-inactivation centre. Nature 485 381-385. [17] Norton, H. K., Emerson, D. J., Huang, H., Kim, J., Titus, K. R., Gu, S., Bassett, D. S. and Phillips-Cremins, J. E. (2018). Detecting hierarchical genome folding with network modularity. Nat. Methods 15 119-122. [18] Rao, S. S. et al. (2014). A 3d map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159 1665-1680. [19] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878-1915. · Zbl 1227.62042 [20] Sanborn, A. L., Rao, S. S. P., Huang, S.-C., Durand, N. C., Huntley, M. H., Jewett, A. I., Bochkov, I. D., Chinnappan, D., Cutkosky, A. et al. (2015). Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl. Acad. Sci. USA 112 E6456-E6465. [21] Sauria, M. E., Phillips-Cremins, J. E., Corces, V. G. and Taylor, J. (2014). Hifive: A normalization approach for higher-resolution hic and 5c chromosome conformation data analysis. Available at https://www.biorxiv.org/content/10.1101/009951v1.full. [22] Sexton, T. et al. (2012). Three-dimensional folding and functional organization principles of the drosophila genome. Cell 148 458-472. [23] Smith, E. M., Lajoie, B. R., Jain, G. and Dekker, J. (2016). Invariant TAD boundaries constrain cell-type-specific looping interactions between promoters and distal elements around the CFTR locus. Am. J. Hum. Genet. 98 185-201. [24] Wang, Y. X. R., Sarkar, P., Ursu, O., Kundaje, A. and Bickel, P. J. (2019). Supplement to “Network modelling of topological domains using Hi-C data.” DOI:10.1214/19-AOAS1244SUPP. · Zbl 1433.62318 [25] Weinreb, C. and Raphael, B. J. (2016). Identification of hierarchical chromatin domains. Bioinformatics 32 1601-1609. [26] Yan, K.-K., Lou, S. and Gerstein, M. (2017). MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions. PLoS Comput. Biol. 13 e1005647. This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.