×

Ridge estimation of the VAR(1) model and its time series chain graph from multivariate time-course omics data. (English) Zbl 1357.62319

Summary: Omics experiments endowed with a time-course design may enable us to uncover the dynamic interplay among genes of cellular processes. Multivariate techniques (like VAR(1) models describing the temporal and contemporaneous relations among variates) that may facilitate this goal are hampered by the high-dimensionality of the resulting data. This is resolved by the presented ridge regularized maximum likelihood estimation procedure for the VAR(1) model. Information on the absence of temporal and contemporaneous relations may be incorporated in this procedure. Its computational efficient implemention is discussed. The estimation procedure is accompanied with an LOOCV scheme to determine the associated penalty parameters. Downstream exploitation of the estimated VAR(1) model is outlined: an empirical Bayes procedure to identify the interesting temporal and contemporaneous relationships, impulse response analysis, mutual information analysis, and covariance decomposition into the (graphical) relations among variates. In a simulation study the presented ridge estimation procedure outperformed a sparse competitor in terms of Frobenius loss of the estimates, while their selection properties are on par. The proposed machinery is illustrated in the reconstruction of the p53 signaling pathway during HPV-induced cellular transformation. The methodology is implemented in the ragt2ridges R-package available from CRAN.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62F10 Point estimation
62-07 Data analysis (statistics) (MSC2010)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abegaz, Sparse time series chain graphical models for reconstructing genetic networks, Biostatistics 13 pp 586– (2013) · doi:10.1093/biostatistics/kxt005
[2] Alon, Biological networks: the tinkerer as an engineer, Science 301 pp 1866– (2003) · doi:10.1126/science.1089072
[3] Alon, Network motifs: theory and experimental approaches, Nature Reviews Genetics 8 pp 450– (2007) · doi:10.1038/nrg2102
[4] Baxter, Insulin-like growth factor (IGF)-binding proteins: interactions with IGFs and intrinsic bioactivities, American Journal of Physiology-Endocrinology and Metabolism 278 pp E967– (2000)
[5] Bickel, Mathematical Statistics, Vol. I (2001)
[6] Bilgrau , A. E. Peeters , C. F. W. Eriksen , P. S. Bøgsted , M. Van Wieringen , W. N. 2015 Targeted fused ridge estimation of inverse covariance matrices from multiple high-dimensional data classes http://arxiv.org/pdf/1509.07982.pdf
[7] Bolstad, A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics 19 pp 185– (2003) · doi:10.1093/bioinformatics/19.2.185
[8] Cover, Elements of Information Theory (2nd edn.) (2006) · Zbl 1140.94001
[9] Dahl , J. Roychowdhury , V. Vandenberghe , L. 2005 Maximum likelihood estimation of Gaussian graphical models: numerical implementation and topology selection. Technical report
[10] Dahlhaus, Graphical interaction models for multivariate time series, Metrika 51 pp 157– (2000) · Zbl 1093.62571 · doi:10.1007/s001840000055
[11] Dahlhaus, Causality and graphical models in time series analysis, Oxford Statistical Science Series pp 115– (2003)
[12] D’Ercole, The case of local versus endocrine IGF-I actions: the jury is still out, Growth Hormone & IGF Research 11 pp 261– (2001) · doi:10.1054/ghir.2001.0243
[13] Efron, Large-scale simultaneous hypothesis testing, Journal of the American Statistical Association 99 pp 96– (2004) · Zbl 1089.62502 · doi:10.1198/016214504000000089
[14] Ernst, Clustering short time series gene expression data, Bioinformatics 21 pp i159– (2005) · doi:10.1093/bioinformatics/bti1022
[15] Fan, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association 96 pp 1348– (2001) · Zbl 1073.62547 · doi:10.1198/016214501753382273
[16] Friedman, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 pp 432– (2008) · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[17] Geiger, Loss of robustness and addiction to IGF1 during early keratinocyte transformation by human papilloma virus 16, PloS One 2 pp e605– (2007) · doi:10.1371/journal.pone.0000605
[18] Hamilton, Time Series Analysis (1994) · Zbl 0831.62061
[19] Harville, Matrix Algebra from a Statistician’s Perspective (2008) · Zbl 1142.15001
[20] Henken, Sequential gene promoter methylation during HPV-induced cervical carcinogenesis, British Journal of Cancer 97 pp 1457– (2007) · doi:10.1038/sj.bjc.6604055
[21] Huber, Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics 18 pp S96– (2002) · doi:10.1093/bioinformatics/18.suppl_1.S96
[22] Irizarry, Exploration, normalization, and summaries of high density oligonucleotide array probe level data, Biostatistics 4 pp 249– (2003) · Zbl 1141.62348 · doi:10.1093/biostatistics/4.2.249
[23] Jones, Covariance decomposition in undirected gaussian graphical models, Biometrika 92 pp 779– (2005) · Zbl 1160.62328 · doi:10.1093/biomet/92.4.779
[24] Kanehisa, Kegg: kyoto encyclopedia of genes and genomes, Nucleic Acids Research 28 pp 27– (2000) · Zbl 05435931 · doi:10.1093/nar/28.1.27
[25] Kodama, Thrombospondin-1 and-2 messenger RNA expression in invasive cervical cancer correlation with angiogenesis and prognosis, Clinical Cancer Research 7 pp 2826– (2001)
[26] Lauritzen, Graphical Models (1996) · Zbl 0907.62001
[27] Lütkepohl, New Introduction to Multiple Time Series Analysis (2005) · doi:10.1007/978-3-540-27752-1
[28] Mersmann , O. 2014 microbenchmark: Accurate Timing Functions. R package version 1.4-2
[29] Newman, Networks: An Introduction (2010) · Zbl 1195.94003 · doi:10.1093/acprof:oso/9780199206650.001.0001
[30] Nicholls, Bias in the estimation of multivariate autoregressions, Australian Journal of Statistics 30A pp 296– (1988) · Zbl 0667.62067 · doi:10.1111/j.1467-842X.1988.tb00484.x
[31] Oberhofer, A general procedure for obtaining maximum likelihood estimates in generalized regression models, Econometrica 42 pp 579– (1974) · Zbl 0292.62049 · doi:10.2307/1911792
[32] Steenbergen, Transition of human papillomavirus type 16 and 18 transfected human foreskin keratinocytes towards immortality: activation of telomerase and allele losses at 3p, 10p, 11q and/or 18q, Oncogene 13 pp 1249– (1996)
[33] Steenbergen, TSLC1 gene silencing in cervical cancer cell lines and cervical neoplasia, Journal of the National Cancer Institute 96 pp 294– (2004) · doi:10.1093/jnci/djh031
[34] Strimmer, A unified approach to false discovery rate estimation, BMC Bioinformatics 9 pp 303– (2008) · Zbl 1318.62329 · doi:10.1186/1471-2105-9-303
[35] Toussaint-Smith, Expression of human papillomavirus type 16 E6 and E7 oncoproteins in primary foreskin keratinocytes is sufficient to alter the expression of angiogenic factors, Oncogene 23 pp 2988– (2004) · doi:10.1038/sj.onc.1207442
[36] Loan, The ubiquitous kronecker product, Journal of Computational and Applied Mathematics 123 pp 85– (2000) · Zbl 0966.65039 · doi:10.1016/S0377-0427(00)00393-9
[37] Wieringen, Ridge estimation of the inverse covariance matrix from high-dimensional data, Computational Statistics and Data Analysis 103 pp 284– (2016) · Zbl 1466.62204 · doi:10.1016/j.csda.2016.05.012
[38] Weinberg, The Biology of Cancer (2006)
[39] Whittaker, Graphical Models in Applied Multivariate Statistics (1990) · Zbl 0732.62056
[40] Wilting, Increased gene copy number at chromosome 20q are frequent in both squamous cell carcinomas and adenocarcinomas of the cervix, Journal of Pathology 209 pp 220– (2006) · doi:10.1002/path.1966
[41] Zellner, An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias, Journal of the American Statistical Association 57 pp 977– (1962) · Zbl 0113.34902 · doi:10.1080/01621459.1962.10480664
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.