Poisson-Tweedie mixed-effects model: a flexible approach for the analysis of longitudinal RNA-seq data. (English) Zbl 07506792

Stat. Model. 21, No. 6, 520-545 (2021); correction ibid. 21, No. 5, 471 (2021).
Summary: We present a new modelling approach for longitudinal overdispersed counts that is motivated by the increasing availability of longitudinal RNA-sequencing experiments. The distribution of RNA-seq counts typically exhibits overdispersion, zero-inflation and heavy tails; moreover, in longitudinal designs repeated measurements from the same subject are typically (positively) correlated. We propose a generalized linear mixed model based on the Poisson-Tweedie distribution that can flexibly handle each of the aforementioned features of longitudinal overdispersed counts. We develop a computational approach to accurately evaluate the likelihood of the proposed model and to perform maximum likelihood estimation. Our approach is implemented in the R package ptmixed, which can be freely downloaded from CRAN. We assess the performance of ptmixed on simulated data, and we present an application to a dataset with longitudinal RNA-sequencing measurements from healthy and dystrophic mice. The applicability of the Poisson-Tweedie mixed-effects model is not restricted to longitudinal RNA-seq data, but it extends to any scenario where non-independent measurements of a discrete overdispersed response variable are available.


62-XX Statistics
Full Text: DOI


[1] Äijö, T, Butty, V, Chen, Z, Salo, V, Tripathi, S, Burge, CB, Lahesmaa, R, Lähdesmäki, H. (2014) Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation. Bioinformatics, 30, i113—i120.
[2] Albert, PS (1999) Longitudinal data analysis (repeated measures) in clinical trials. Statistics in Medicine, 18, 1707-1732.
[3] Benjamini, Y, Hochberg, Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57, 289-300. · Zbl 0809.62014
[4] Bolker, BM, Brooks, ME, Clark, CJ, Geange, SW, Poulsen, JR, Stevens, MHH, White, J-SS (2009) Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology & Evolution, 24, 127-135.
[5] Bonat, WH, Jørgensen, B, Kokonendji, CC, Hinde, J, Demetrio, CG (2018) Extended Poisson-Tweedie: Properties and regression models for count data. Statistical Modelling, 18, 24-49. · Zbl 07289497
[6] Ceco, E, McNally, EM (2013) Modifying muscular dystrophy through transforming growth factor-. The FEBS Journal, 280, 4198-4209.
[7] Cui, S, Ji, T, Li, J, Cheng, J, Qiu, J (2016) What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment. Statistical Applications in Genetics and Molecular Biology, 15, 87-105. · Zbl 1343.92009
[8] Dogra, C, Srivastava, DS, Kumar, A (2008) Protein-DNA array-based identification of transcription factor activities differentially regulated in skeletal muscle of normal and dystrophin-deficient mdx mice. Molecular and Cellular Biochemistry, 312, 17-24.
[9] El-Shaarawi, AH, Zhu, R, Joe, H (2011) Modelling species abundance using the Poisson-Tweedie family. Environmetrics, 22, 152-164.
[10] Esnaola, M, Puig, P, Gonzalez, D, Castelo, R, Gonzalez, JR (2013) A flexible count data model to fit the wide diversity of expression proles arising from extensively replicated RNA-seq experiments. BMC Bioinformatics, 14, 254.
[11] Flanigan, KM, Ceco, E, Lamar, K-M, Kaminoh, Y, Dunn, DM, Mendell, JR, King, WM, Pestronk, A, Florence, JM, Mathews, KD, Finkel, RS, Swoboda, KJ, Gappmaier, E, Howard, MT, Day, DW, McDonald, C, McNally, EM, Weiss, RB for the United Dystrophinopathy Project (2013) LTBP4 genotype predicts age of ambulatory loss in Duchenne muscular dystrophy. Annals of Neurology, 73, 481-488.
[12] Forcina, L, Miano, C, Scicchitano, BM, Musaro, A (2019) Signals from the niche: Insights into the role of IGF-1 and IL-6 in modulating skeletal muscle fibrosis. Cells, 8, 232.
[13] Fournier, DA, Skaug, HJ, Ancheta, J, Ianelli, J, Magnusson, A, Maunder, MN, Nielsen, A, Sibert, J (2012) AD model builder: Using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optimization Methods and Software, 27, 233-249. · Zbl 1242.90230
[14] Gerber, HU (1991) From the generalized gamma to the generalized negative binomial distribution. Insurance: Mathematics and Economics, 10, 303-309. · Zbl 0743.62014
[15] Hathout, Y, Liang, C, Ogundele, M, Xu, G, Tawalbeh, SM, Dang, UJ, Hoffman, EP, Gordish-Dressman, H, Conklin, LS, van den Anker, JN, Clemens, PR, Mah, JK, Henricson, E, McDonald, C (2019) Disease-specific and glucocorticoid-responsive serum biomarkers for Duchenne muscular dystrophy. Scientific Reports, 9, 1-13.
[16] Law, CW, Chen, Y, Shi, W, Smyth, GK (2014) voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15, R29.
[17] Love, MI, Huber, W, Anders, S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550.
[18] Marioni, JC, Mason, CE, Mane, SM, Stephens, M, Gilad, Y (2008) RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research, 18, 1509-1517.
[19] McCulloch, CE, Searle, SR, Neuhaus, JM (2008) Generalized, Linear, and Mixed Models. John Wiley & Sons. · Zbl 1165.62050
[20] Molenberghs, G, Verbeke, G (2007) Likelihood ratio, score, and Wald tests in a constrained parameter space. The American Statistician, 61 22-27.
[21] Moresi, V, Adamo, S, Berghella, L (2019) The JAK/STAT pathway in skeletal muscle pathophysiology. Frontiers in Physiology, 10. doi: 10.3389/fphys.2019.00500.
[22] Mortazavi, A, Williams, BA, McCue, K, Schaeffer, L, Wold, B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 5, 621-628.
[23] Nelder, JA, Mead, R (1965) A simplex method for function minimization. The Computer Journal, 7, 308-313. · Zbl 0229.65053
[24] Nueda, MJ, Tarazona, S, Conesa, A (2014) Next maSigPro: Updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-2602.
[25] Pinheiro, JC, Chao, EC (2006) Efficient Lap- lacian and adaptive Gaussian quadrature algorithms for multilevel generalized linear mixed models. Journal of Computational and Graphical Statistics, 15, 58-81.
[26] Robinson, MD, McCarthy, DJ, Smyth, GK (2010) edgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139-140.
[27] Robinson, MD, Oshlack, A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11, R25.
[28] Sha, Y, Phan, JH, Wang, MD (2015) Effect of low-expression gene filtering on detection of differentially expressed genes in RNA-seq data. In Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, pages 6461-6464. Piscataway, NJ: IEEE.
[29] Signorelli, M, Spitali, P, Tsonaka, R (2020) ptmixed: Poisson-Tweedie generalized linear mixed model. R package version 0.5.2. URL: https://CRAN.R-project.org/package=ptmixed (last accessed on 22 June 2020).
[30] Slenter, DN, Kutmon, M, Hanspers, K, Riutta, A, Windsor, J, Nunes, N, Melius, J, Cirillo, E, Coort, SL, Digles, D, Ehrhart, F, Giesbertz, P, Kalafati, M, Martens, M, Miller, R, Nishida, K, Rieswijk, L, Waagmeester, A, Eijssen, LMT, Evelo, CT, Pico, AR, Willighagen, EL (2017) Wikipathways: A multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Research, 46, D661-D667.
[31] Smyth, GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, 1-25. · Zbl 1038.62110
[32] Steen, N, Byrne, G, Gelbard, E (1969) Gaussian quadratures for the integrals ∫∞0e−x2f(x)dx<math id=“math287-1471082X20936017” overflow=“scroll” altimg=“eq-00280.gif”><msubsup><mo>∫</mo><mn>0</mn><mo>∞</mo></msubsup><msup><mi>e</mi><mrow><mo>−</mo><msup><mi>x</mi><mn>2</mn></msup></mrow></msup><mi>f</mi><mo stretchy=“false”>(</mo><mi>x</mi><mo stretchy=“false”>)</mo><mi mathvariant=“italic”>dx</mi></math> and ∫b0e−x2f(x)dx<math id=“math288-1471082X20936017” overflow=“scroll” altimg=“eq-00281.gif”><msubsup><mo>∫</mo><mn>0</mn><mi>b</mi></msubsup><msup><mi>e</mi><mrow><mo>−</mo><msup><mi>x</mi><mn>2</mn></msup></mrow></msup><mi>f</mi><mo stretchy=“false”>(</mo><mi>x</mi><mo stretchy=“false”>)</mo><mi mathvariant=“italic”>dx</mi></math>. Mathematics of Computation, 23, 661-671. · Zbl 0187.12902
[33] Sun, X, Dalpiaz, D, Wu, D, Liu, JS, Zhong, W, Ma, P (2016) Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model. BMC Bioinformatics, 17, 324.
[34] van de Wiel, MA, Neerincx, M, Buffart, TE, Sie, D, Verheul, HM (2014) ShrinkBayes: A versatile R-package for analysis of count- based sequencing data in complex study designs. BMC Bioinformatics, 15, 116.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.