Graphical models for zero-inflated single cell gene expression. (English) Zbl 1423.62148

Summary: Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene coregulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional independences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods, or in bulk data sets. A R implementation is available at https://github.com/amcdavid/HurdleNormal.


62P10 Applications of statistics to biology and medical sciences; meta analysis
92D20 Protein sequences, DNA sequences
62J07 Ridge regression; shrinkage estimators (Lasso)
Full Text: DOI arXiv Euclid


[1] Adachi, Y., Hiramatsu, S., Tokuda, N., Sharifi, K., Ebrahimi, M., Islam, A., Kagawa, Y., Koshy Vaidyan, L., Sawada, T., Hamano, K. and Owada, Y. (2012). Fatty acid-binding protein 4 (FABP4) and FABP5 modulate cytokine production in the mouse thymic epithelial cells. Histochem. Cell Biol.138 397-406.
[2] Chen, S., Witten, D. M. and Shojaie, A. (2015). Selection and estimation for mixed graphical models. Biometrika102 47-64. · Zbl 1345.62081
[3] Cheng, J., Li, T., Levina, E. and Zhu, J. (2017). High-dimensional mixed graphical models. J. Comput. Graph. Statist.26 367-378.
[4] The Gene Ontology Consortium Gene ontology consortium: Going forward. Nucleic Acids Res.43. (D1): D1049-D1056, 2015.
[5] de Jong, E. C., Vieira, P. L., Kalinski, P., Schuitemaker, J. H. N., Tanaka, Y., Wierenga, E. A., Yazdanbakhsh, M. and Kapsenberg, M. L. (2002). Microbial compounds selectively induce Th1 cell-promoting or Th2 cell-promoting dendritic cells in vitro with diverse th cell-polarizing signals. J. Immunol.168 1704-1709.
[6] Denda-Nagai, K., Aida, S., Saba, K., Suzuki, K., Moriyama, S., Oo-puthinan, S., Tsuiji, M., Morikawa, A., Kumamoto, Y., Sugiura, D., Kudo, A., Akimoto, Y., Kawakami, H., Bovin, N. V. and Irimura, T. (2010). Distribution and function of macrophage galactose-type C-type lectin 2 (MGL2/CD301b): Efficient uptake and presentation of glycosylated antigens by dendritic cells. J. Biol. Chem.285 19193-19204.
[7] Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal.90 196-212. · Zbl 1047.62104
[8] Drton, M. and Maathuis, M. (2017). Structure learning in graphical modeling. Annu. Rev. Stat. Appl.4 365-393.
[9] Drton, M., Sturmfels, B. and Sullivant, S. (2009). Lectures on Algebraic Statistics. Oberwolfach Seminars39. Birkhäuser, Basel. · Zbl 1166.13001
[10] Eltoft, T., Kim, T. and Lee, T. W. (2006). On the multivariate Laplace distribution. IEEE Signal Process. Lett.13 300-303.
[11] Finak, G., McDavid, A., Yajima, M., Deng, J., Gersuk, V., Shalek, A. K., Slichter, C. K., Miller, H. W., Juliana McElrath, M., Prlic, M., Linsley, P. S. and Gottardo, R. (2015). MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol.16 278.
[12] Foygel, R. and Drton, M. (2010). Exact block-wise optimization in group lasso and sparse group lasso for linear regression. 1-19. Arxiv preprint. Available at arXiv:1010.3320.
[13] Marinov, G. K., Williams, B. A., McCue, K., Schroth, G. P., Gertz, J., Myers, R. M. and Wold, B. J. (2014). From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing. Genome Res.24 496-510.
[14] Janes, K. A., Wang, C.-C., Holmberg, K. J., Cabral, K. and Brugge, J. S. (2010). Identifying single-cell molecular programs by stochastic profiling. Nat. Methods7 311-317.
[15] Johnston, R. J., Poholek, A. C., DiToro, D., Yusuf, I., Eto, D., Barnett, B., Dent, A. L., Craft, J. and Crotty, S. (2009). Bcl6 and Blimp-1 are reciprocal and antagonistic regulators of T follicular helper cell differentiation. Science325.
[16] Kim, J. K. and Marioni, J. C. (2013). Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol.14.
[17] Pham, L. V., Tamayo, A. T., Yoshimura, L. C., Lin-Lee, Y. C. and Ford, R. J. (2005). Constitutive NF-kappaB and NFAT activation in aggressive B-cell lymphomas synergistically activates the CD154 gene and maintains lymphoma cell survival. Blood106 3940-3947.
[18] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series17. Oxford University Press, New York.
[19] Lee, J. D. and Hastie, T. J. (2013). Structure learning of mixed graphical models. In AISTATS 16 31 388-396, Scottsdale, AZ. Available at http://jmlr.org/proceedings/papers/v31/lee13a.html.
[20] Li, Y., Pearl, S. A. and Jackson, S. A. (2015). Gene networks in plant biology: Approaches in reconstruction and analysis. Trends Plant Sci.20 664-675.
[21] Lin, L., Finak, G., Ushey, K., Seshadri, C., Hawn, T. R., Frahm, N., Scriba, T. J., Mahomed, H., Hanekom, W. et al. (2015). COMPASS identifies T-cell subsets correlated with clinical outcomes. Nat. Biotechnol.33 610-616.
[22] Ma, C. S., Deenick, E. K., Batten, M. and Tangye, S. G. (2012). The origins, function, and regulation of T follicular helper cells. J. Exp. Med.209 1241-1253.
[23] Markowetz, F. and Spang, R. (2007). Inferring cellular networks: A review. BMC Bioinform.8.
[24] McDavid, A., Finak, G., Chattopadyay, P. K., Dominguez, M., Lamoreaux, L., Ma, S. S., Roederer, M. and Gottardo, R. (2013). Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics29 461-467.
[25] McDavid, A., Gottardo, R., Simon, N. and Drton, M. (2019). Supplement to “Graphical models for zero-inflated single cell gene expression.” DOI:10.1214/18-AOAS1213SUPP.
[26] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist.34 1436-1462. · Zbl 1113.62082
[27] Parikh, N. and Boyd, S. (2014). Proximal algorithms. Found. Trends Optim.1 123-231.
[28] Precopio, M. L., Betts, M. R., Parrino, J., Price, D. A., Gostick, E., Ambrozak, D. R., Asher, T. E., Douek, D. C., Harari, A. et al. (2007). Immunization with vaccinia virus induces polyfunctional and phenotypically distinctive CD8(\(+\)) T cell responses. J. Exp. Med.204 1405-1416.
[29] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using \(\ell_{1}\)-regularized logistic regression. Ann. Statist.38 1287-1319. · Zbl 1189.62115
[30] Shah, R. D. and Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol.75 55-80.
[31] Shalek, A. K., Satija, R., Shuga, J., Trombetta, J. J., Gennert, D., Lu, D., Chen, P., Gertner, R. S., Gaublomme, J. T. et al. (2014). Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature510 263-269.
[32] Simon, N. and Tibshirani, R. (2012). Standardization and the group Lasso penalty. Statist. Sinica22 983-1001. · Zbl 1257.62080
[33] Tansey, W., Padilla, O. H. M., Suggala, A. S. and Ravikumar, P. (2015). Vector-space Markov random fields via exponential families. In Proceedings of the 32nd International Conference on Machine Learning37 684-692. Available at http://jmlr.org/proceedings/papers/v37/tansey15.html.
[34] Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J. and Tibshirani, R. J. (2012). Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Ser. B. Stat. Methodol.74 245-266. · Zbl 1411.62213
[35] Yang, E., Baker, Y., Ravikumar, P., Allen, G. and Liu, Z. (2014). Mixed graphical models via exponential families. In AISTATS 17 33. Reykjavik, Iceland.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.