Zero-inflated quantile rank-score based test (ZIQRank) with application to scRNA-seq differential gene expression analysis. (English) Zbl 1498.62233

Summary: Differential gene expression analysis based on scRNA-seq data is challenging due to two unique characteristics of scRNA-seq data. First, multimodality and other heterogeneity of the gene expression among different cell conditions lead to divergences in the tail events or crossings of the expression distributions. Second, scRNA-seq data generally have a considerable fraction of dropout events, causing zero inflation in the expression. To account for the first characteristic, existing parametric approaches targeting the mean difference in gene expression are limited, while quantile regression that examines various locations in the distribution will improve the power. However, the second characteristic, zero inflation, makes the traditional quantile regression invalid and underpowered. We propose a quantile-based test that handles the two characteristics, multimodality and zero inflation, simultaneously. The proposed quantile rank-score based test for differential distribution detection (ZIQRank) is derived under a two-part quantile regression model for zero-inflated outcomes. It comprises a test in logistic modeling for the zero counts and a collection of rank-score tests adjusting for zero inflation at multiple prespecified quantiles of the positive part. The testing decision is based on an aggregate result by combining the marginal \(p\)-values by MinP or Cauchy procedure. The proposed test is asymptotically justified and evaluated with simulation studies. It shows a higher precision-recall AUC in detecting true differentially expressed genes (DEGs) than the existing methods. We apply the ZIQRank test to a TPM scRNA-seq data on human glioblastoma tumors and exclusively identify a group of DEGs between neoplastic and nonneoplastic cells, which are heterogeneous and have been proved to be associated with glioma. Application to a UMI count scRNA-seq data on cells from mouse intestinal organoids further demonstrates the capability of ZIQRank to improve and complement the existing approaches.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62G10 Nonparametric hypothesis testing
62J05 Linear regression; mixed models
62G08 Nonparametric regression and quantile regression


Full Text: DOI


[1] Birtwistle, M. R., Rauch, J., Kiyatkin, A., Aksamitiene, E., Dobrzyński, M., Hoek, J. B., Kolch, W., Ogunnaike, B. A. and Kholodenko, B. N. (2012). Emergence of bimodal cell population responses from the interplay between analog single-cell signaling and protein expression noise. BMC Syst. Biol. 6 109. · doi:10.1186/1752-0509-6-109
[2] Buettner, F., Natarajan, K. N., Casale, F. P., Proserpio, V., Scialdone, A., Theis, F. J., Teichmann, S. A., Marioni, J. C. and Stegle, O. (2015). Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33 155-160. · doi:10.1038/nbt.3102
[3] Costa-Silva, J., Domingues, D. and Lopes, F. M. (2017). RNA-Seq differential expression analysis: An extended review and a software tool. PLoS ONE 12 e0190152.
[4] Darmanis, S., Sloan, S. A., Croote, D., Mignardi, M., Chernikova, S., Samghababi, P., Zhang, Y., Neff, N., Kowarsky, M. et al. (2017). Single-cell RNA-seq analysis of infiltrating neoplastic cells at the migrating front of human glioblastoma. Cell Rep. 21 1399-1410. · doi:10.1016/j.celrep.2017.10.030
[5] Dobrzyński, M., Fey, D., Nguyen, L. K. and Kholodenko, B. N. (2012). Bimodal protein distributions in heterogeneous oscillating systems. In International Conference on Computational Methods in Systems Biology 17-28. Springer, Berlin.
[6] Dobrzyński, M., Nguyen, L. K., Birtwistle, M. R., von Kriegsheim, A., Fernández, A. B., Cheong, A., Kolch, W. and Kholodenko, B. N. (2014). Nonlinear signalling networks and cell-to-cell variability transform external signals into broadly distributed or bimodal responses. J. R. Soc. Interface 11 20140383. · doi:10.1098/rsif.2014.0383
[7] Fazi, B., Felsani, A., Grassi, L., Moles, A., D’Andrea, D., Toschi, N., Sicari, D., De Bonis, P., Anile, C. et al. (2015). The transcriptome and miRNome profiling of glioblastoma tissues and peritumoral regions highlights molecular pathways shared by tumors and surrounding areas and reveals differences between short-term and long-term survivors. Oncotarget 6 22526.
[8] Finak, G., McDavid, A., Yajima, M., Deng, J., Gersuk, V., Shalek, A. K., Slichter, C. K., Miller, H. W., McElrath, M. J. et al. (2015). MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16 278. · doi:10.1186/s13059-015-0844-5
[9] Grün, D., Lyubimova, A., Kester, L., Wiebrands, K., Basak, O., Sasaki, N., Clevers, H. and Van Oudenaarden, A. (2015). Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525 251-255.
[10] Gutenbrunner, C., Jurečková, J., Koenker, R. and Portnoy, S. (1993). Tests of linear hypotheses based on regression rank scores. J. Nonparametr. Stat. 2 307-331. · Zbl 1360.62216 · doi:10.1080/10485259308832561
[11] He, Z., Xu, B., Lee, S. and Ionita-Laza, I. (2017). Unified sequence-based association tests allowing for multiple functional annotations and meta-analysis of noncoding variation in metabochip data. Am. J. Hum. Genet. 101 340-352.
[12] Hong, S., Chen, X., Jin, L. and Xiong, M. (2013). Canonical correlation analysis for RNA-seq co-expression networks. Nucleic Acids Res. 41 e95-e95.
[13] Islam, S., Kjällquist, U., Moliner, A., Zajac, P., Fan, J.-B., Lönnerberg, P. and Linnarsson, S. (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21 1160-1167.
[14] KÆrn, M., Elston, T. C., Blake, W. J. and Collins, J. J. (2005). Stochasticity in gene expression: From theories to phenotypes. Nat. Rev. Genet. 6 451.
[15] Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 27-30. · doi:10.1093/nar/28.1.27
[16] Kharchenko, P. V., Silberstein, L. and Scadden, D. T. (2014). Bayesian approach to single-cell differential expression analysis. Nat. Methods 11 740-742.
[17] Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge. · doi:10.1017/CBO9780511754098
[18] Koenker, R. and Bassett, G. Jr. (1978). Regression quantiles. Econometrica 46 33-50. · Zbl 0373.62038 · doi:10.2307/1913643
[19] Korthauer, K. D., Chu, L.-F., Newton, M. A., Li, Y., Thomson, J., Stewart, R. and Kendziorski, C. (2016). A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol. 17 222. · doi:10.1186/s13059-016-1077-y
[20] Lee, S., Wu, M. C. and Lin, X. (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13 762-775.
[21] Ling, W., Zhang, W., Cheng, B. and Wei, Y. (2021). Supplement to “Zero-inflated quantile rank-score based test (ZIQRank) with application to scRNA-seq differential gene expression analysis.” https://doi.org/10.1214/21-AOAS1442SUPPA, https://doi.org/10.1214/21-AOAS1442SUPPB
[22] Liu, Y. and Xie, J. (2020). Cauchy combination test: A powerful test with analytic \(p\)-value calculation under arbitrary dependency structures. J. Amer. Statist. Assoc. 115 393-402. · Zbl 1437.62163 · doi:10.1080/01621459.2018.1554485
[23] Love, M. I., Huber, W. and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 550. · doi:10.1186/s13059-014-0550-8
[24] Machado, J. A. F. and Santos Silva, J. M. C. (2005). Quantiles for counts. J. Amer. Statist. Assoc. 100 1226-1237. · Zbl 1117.62395 · doi:10.1198/016214505000000330
[25] McKenzie, L. D., LeClair, J. W., Miller, K. N., Strong, A. D., Chan, H. L., Oates, E. L., Ligon, K. L., Brennan, C. W. and Chheda, M. G. (2019). CHD4 regulates the DNA damage response and RAD51 expression in glioblastoma. Sci. Rep. 9 4444. · doi:10.1038/s41598-019-40327-w
[26] Molin, A. D., Baruzzo, G. and Camillo, B. D. (2017). Single-cell RNA-sequencing: Assessment of differential expression analysis methods. Front. Genet. 8 62. · doi:10.3389/fgene.2017.00062
[27] Monk, N. A. (2003). Oscillatory expression of Hes1, p53, and NF-\(κB\) driven by transcriptional time delays. Curr. Biol. 13 1409-1413.
[28] Obacz, J., Archambeau, J., Le Reste, P. J., Pineau, R., Jouan, F., Barroso, K., Vlachavas, E., Voutetakis, K., Fainsod-Levi, T. et al. (2019). IRE1-UBE2D3 signaling controls the recruitment of myeloid cells to glioblastoma. BioRxiv 533018.
[29] Patel, A. P., Tirosh, I., Trombetta, J. J., Shalek, A. K., Gillespie, S. M., Wakimoto, H., Cahill, D. P., Nahed, B. V., Curry, W. T. et al. (2014). Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344 1396-1401.
[30] Ramsköld, D., Luo, S., Wang, Y.-C., Li, R., Deng, Q., Faridani, O. R., Daniels, G. A., Khrebtukova, I., Loring, J. F. et al. (2012). Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30 777.
[31] Singer, Z. S., Yong, J., Tischler, J., Hackett, J. A., Altinok, A., Surani, M. A., Cai, L. and Elowitz, M. B. (2014). Dynamic heterogeneity and DNA methylation in embryonic stem cells. Mol. Cell 55 319-331.
[32] Soneson, C. and Robinson, M. D. (2018). Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15 255-261. · doi:10.1038/nmeth.4612
[33] Song, X., Li, G., Zhou, Z., Wang, X., Ionita-Laza, I. and Wei, Y. (2017). QRank: A novel quantile regression tool for eQTL discovery. Bioinformatics 33 2123-2130.
[34] Teng, D.-C., Sun, J., An, Y.-Q., Hu, Z.-H., Liu, P., Ma, Y.-C., Han, B. and Shi, Y. (2016). Role of PHLPP1 in inflammation response: Its loss contributes to gliomas development and progression. Int. Immunopharmacol. 34 229-234.
[35] Tombolan, L., Poli, E., Martini, P., Zin, A., Millino, C., Pacchioni, B., Celegato, B., Bisogno, G., Romualdi, C. et al. (2016). Global DNA methylation profiling uncovers distinct methylation patterns of protocadherin alpha4 in metastatic and non-metastatic rhabdomyosarcoma. BMC Cancer 16 886.
[36] Trapnell, C., Cacchiarelli, D., Grimsby, J., Pokharel, P., Li, S., Morse, M., Lennon, N. J., Livak, K. J., Mikkelsen, T. S. et al. (2014). The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32 381.
[37] Treutlein, B., Brownfield, D. G., Wu, A. R., Neff, N. F., Mantalas, G. L., Espinoza, F. H., Desai, T. J., Krasnow, M. A. and Quake, S. R. (2014). Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509 371.
[38] Trombetta, J. J., Gennert, D., Lu, D., Satija, R., Shalek, A. K. and Regev, A. (2014). Preparation of single-cell RNA-Seq libraries for next generation sequencing. Curr. Protoc. Mol. Biol. 107 4-22.
[39] Uhlen, M., Oksvold, P., Fagerberg, L., Lundberg, E., Jonasson, K., Forsberg, M., Zwahlen, M., Kampf, C., Wester, K. et al. (2010). Towards a knowledge-based human protein atlas. Nat. Biotechnol. 28 1248.
[40] Wang, H. J. (2009). Inference on quantile regression for heteroscedastic mixed models. Statist. Sinica 19 1247-1261. · Zbl 1166.62029
[41] Wang, H. and He, X. (2007). Detecting differential expressions in GeneChip microarray studies: A quantile approach. J. Amer. Statist. Assoc. 102 104-112. · Zbl 1284.62439 · doi:10.1198/016214506000001220
[42] Wang, T., Li, B., Nelson, C. E. and Nabavi, S. (2019). Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinform. 20 40.
[43] Wei, Y., Pere, A., Koenker, R. and He, X. (2006). Quantile regression methods for reference growth charts. Stat. Med. 25 1369-1382. · doi:10.1002/sim.2271
[44] Wei, Y., Song, X., Liu, M., Ionita-Laza, I. and Reibman, J. (2016). Quantile regression in the secondary analysis of case-control data. J. Amer. Statist. Assoc. 111 344-354. · doi:10.1080/01621459.2015.1008101
[45] Zhang, Z. H., Jhaveri, D. J., Marshall, V. M., Bauer, D. C., Edson, J., Narayanan, R. K., Robinson, G. J., Lundberg, A. E., Bartlett, P. F. et al. (2014). A comparative study of techniques for differential expression analysis on RNA-Seq data. PLoS ONE 9 e103207.
[46] Zhang, W., Wei, Y., Zhang, D. and Xu, E. Y. (2020). ZIAQ: A quantile regression method for differential expression analysis of single-cell RNA-seq data. Bioinformatics 36 3124-3130. · doi:10.1093/bioinformatics/btaa098
[47] Zhao, Z. and Xiao, Z. (2014). Efficient regressions via optimally combining quantile information. Econometric Theory 30 1272-1314 · Zbl 1314.62151 · doi:10.1017/S0266466614000176
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.