zbMATH — the first resource for mathematics

Estimating a common covariance matrix for network meta-analysis of gene expression datasets in diffuse large B-cell lymphoma. (English) Zbl 1405.62159
Summary: The estimation of covariance matrices of gene expressions has many applications in cancer systems biology. Many gene expression studies, however, are hampered by low sample size and it has therefore become popular to increase sample size by collecting gene expression data across studies. Motivated by the traditional meta-analysis using random effects models, we present a hierarchical random covariance model and use it for the meta-analysis of gene correlation networks across 11 large-scale gene expression studies of diffuse large B-cell lymphoma (DLBCL). We suggest to use a maximum likelihood estimator for the underlying common covariance matrix and introduce an EM algorithm for estimation. By simulation experiments comparing the estimated covariance matrices by cophenetic correlation and Kullback-Leibler divergence the suggested estimator showed to perform better or not worse than a simple pooled estimator. In a posthoc analysis of the estimated common covariance matrix for the DLBCL data we were able to identify novel biologically meaningful gene correlation networks with eigengenes of prognostic value. In conclusion, the method seems to provide a generally applicable framework for meta-analysis, when multiple features are measured and believed to share a common covariance matrix obscured by study dependent noise.
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H12 Estimation in multivariate analysis
92C42 Systems biology, networks
Full Text: DOI Euclid
[1] Agnelli, L., Forcato, M., Ferrari, F., Tuana, G., Todoerti, K., Walker, B. A., Morgan, G. J., Lombardi, L., Bicciato, S. and Neri, A. (2011). The reconstruction of transcriptional networks reveals critical genes with implications for clinical outcome of multiple myeloma. Clin. Cancer Res.17 7402–7412.
[2] Bilgrau, A. E. (2014). correlateR: Fast, efficient, and robust partial correlations. R package version 0.1. Available at http://github.com/AEBilgrau/correlateR.
[3] Bilgrau, A. E., Brøndum, R. F., Eriksen, P. S., Dybkær, K. and Bøgsted, M. (2018). Supplement to “Estimating a common covariance matrix for network meta-analysis of gene expression datasets in diffuse large B-cell lymphoma.” DOI:10.1214/18-AOAS1136SUPPA.
[4] Borenstein, M., Hedges, L. V., Higgins, J. P. and Rothstein, H. R. (2010). A basic introduction to fixed-effect and random-effects models for meta-analysis. Res. Synth. Methods1 97–111.
[5] Cheng, P., Corzo, C. A., Luetteke, N., Yu, B., Nagaraj, S., Bui, M. M., Ortiz, M., Nacken, W., Sorg, C., Vogl, T. et al. (2008). Inhibition of dendritic cell differentiation and accumulation of myeloid-derived suppressor cells in cancer is regulated by S100A9 protein. J. Exp. Med.205 2235–2249.
[6] Choi, J. K., Yu, U., Kim, S. and Yoo, O. J. (2003). Combining multiple microarray studies and modeling interstudy variation. Bioinformatics19 i84–i90.
[7] Clarke, C., Madden, S. F., Doolan, P., Aherne, S. T., Joyce, H., O’Driscoll, L., Gallagher, W. M., Hennessy, B. T., Moriarty, M., Crown, J., Kennedy, S. and Clynes, M. (2013). Correlating transcriptional networks to breast cancer survival: A large-scale coexpression analysis. Carcinogenesis34 2300–2308.
[8] Compagno, M., Lim, W. K., Grunn, A., Nandula, S. V., Brahmachary, M., Shen, Q., Bertoni, F., Ponzoni, M., Scandurra, M., Califano, A. et al. (2009). Mutations of multiple genes cause deregulation of NF-\(κ \)B in diffuse large B-cell lymphoma. Nature459 717–721.
[9] Dai, M., Wang, P., Boyd, A. D., Kostov, G., Athey, B., Jones, E. G., Bunney, W. E., Myers, R. M., Speed, T. P., Akil, H., Watson, S. J. and Meng, F. (2005). Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res.33 e175.
[10] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B39 1–38. · Zbl 0364.62022
[11] DerSimonian, R. and Laird, N. (1986). Meta-analysis in clinical trials. Control. Clin. Trials7 177–188.
[12] Dybkær, K., Bøgsted, M., Falgreen, S., Bødker, J. S., Kjeldsen, M. K., Schmitz, A., Bilgrau, A. E., Xu-Monette, Z. Y., Li, L., Bergkvist, K. S., Laursen, M. B., Rodrigo-Domingo, M., Marques, S. C., Rasmussen, S. B., Nyegaard, M., Gaihede, M., Møller, M. B., Samworth, R. J., Shah, R. D., Johansen, P., El-Galaly, T. C., Young, K. H. and Johnsen, H. E. (2015). A diffuse large B-cell lymphoma classification system that associates normal B-cell subset phenotypes with prognosis. J. Clin. Oncol.33 1379–1388.
[13] Eddelbuettel, D. and François, R. (2011). Rcpp: Seamless R and C++ integration. J. Stat. Softw.40 1–18.
[14] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics9 432–441. · Zbl 1143.62076
[15] Fulmer, T. (2008). Suppressing the suppressors. SciBX1(38). DOI:10.1038/scibx.2008.914.
[16] Galili, T. (2015). dendextend: An R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics31 3718–3720.
[17] Gautier, L., Cope, L., Bolstad, B. M. and Irizarry, R. A. (2004). affy—Analysis of affymetrix GeneChip data at the probe level. Bioinformatics20 307–315.
[18] Horvath, S. (2011). Weighted Network Analysis: Applications in Genomics and Systems Biology. Springer, Berlin.
[19] Hummel, M., Bentink, S., Berger, H., Klapper, W., Wessendorf, S., Barth, T. F., Bernd, H.-W., Cogliatti, S. B., Dierlamm, J., Feller, A. C. et al. (2006). A biologic definition of Burkitt’s lymphoma from transcriptional and genomic profiling. N. Engl. J. Med.354 2419–2430.
[20] International Lymphoma Study Group (1997). A clinical evaluation of the international lymphoma study group classification of non-Hodgkin’s lymphoma. Blood89 3909–3918.
[21] Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics4 249–264. · Zbl 1141.62348
[22] Jima, D. D., Zhang, J., Jacobs, C., Richards, K. L., Dunphy, C. H., Choi, W. W., Au, W. Y., Srivastava, G., Czader, M. B., Rizzieri, D. A. et al. (2010). Deep sequencing of the small RNA transcriptome of normal and malignant human B cells identifies hundreds of novel microRNAs. Blood116 e118–e127.
[23] Johnson, W. E., Li, C. and Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics8 118–127. · Zbl 1170.62389
[24] Lee, J. A., Dobbin, K. K. and Ahn, J. (2014). Covariance adjustment for batch effect in gene expression data. Stat. Med.33 2681–2695.
[25] Lenz, G., Wright, G. W., Emre, N. T., Kohlhammer, H., Dave, S. S., Davis, R. E., Carty, S., Lam, L. T., Shaffer, A., Xiao, W. et al. (2008). Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways. Proc. Natl. Acad. Sci. USA105 13520–13525.
[26] Mattiussi, V., Tumminello, M., Iori, G. and Mantegna, R. N. (2011). Comparing correlation matrix estimators via Kullback–Leibler divergence. Preprint, DOI:10.2139/ssrn.1966714.
[27] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist.34 1436–1462. · Zbl 1113.62082
[28] Monti, S., Chapuy, B., Takeyama, K., Rodig, S. J., Hao, Y., Yeda, K. T., Inguilizian, H., Mermel, C., Currie, T., Dogan, A. et al. (2012). Integrative analysis reveals an outcome-associated and targetable pattern of p53 and cell cycle deregulation in diffuse large B cell lymphoma. Cancer Cell22 359–372.
[29] Phipson, B. and Smyth, G. K. (2010). Permutation \(p\)-values should never be zero: Calculating exact \(p\)-values when permutations are randomly drawn. Stat. Appl. Genet. Mol. Biol.9 Art. 39, 14. · Zbl 1304.92098
[30] R Core Team (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
[31] Reimand, J., Arak, T., Adler, P., Kolberg, L., Reisberg, S., Peterson, H. and Vilo, J. (2016). g:Profiler—A web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res.44 W83–W89.
[32] Salaverria, I., Philipp, C., Oschlies, I., Kohler, C. W., Kreuz, M., Szczepanowski, M., Burkhardt, B., Trautmann, H., Gesk, S., Andrusiewicz, M. et al. (2011). Translocations activating IRF4 identify a subtype of germinal center-derived B-cell lymphoma affecting predominantly children and young adults. Blood118 139–147.
[33] Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychol. Bull.86 420–428.
[34] Sokal, R. R. and Rohlf, F. J. (1962). The comparison of dendrograms by objective methods. Taxon11 33–40.
[35] Stroncek, D. F., Butterfield, L. H., Cannarile, M. A., Dhodapkar, M. V., Greten, T. F., Grivel, J. C., Kaufman, D. R., Kong, H. H., Korangy, F., Lee, P. P., Marincola, F., Rutella, S., Siebert, J. C., Trinchieri, G. and Seliger, B. (2017). Systematic evaluation of immune regulation and modulation. J. Immunother. Cancer5 21.
[36] van Wieringen, W. N. and Peeters, C. F. W. (2016). Ridge estimation of inverse covariance matrices from high-dimensional data. Comput. Statist. Data Anal.103 284–303. · Zbl 1466.62204
[37] Visco, C., Li, Y., Xu-Monette, Z. Y., Miranda, R. N., Green, T. M., Tzankov, A., Wen, W., Liu, W., Kahl, B., d’Amore, E. et al. (2012). Comprehensive gene expression profiling and immunohistochemical studies support application of immunophenotypic algorithm for molecular subtype classification in diffuse large B-cell lymphoma: A report from the international DLBCL Rituximab-CHOP consortium program study. Leukemia26 2103–2113.
[38] Williams, P. M., Li, R., Johnson, N. A., Wright, G., Heath, J.-D. and Gascoyne, R. D. (2010). A novel method of amplification of FFPET-derived RNA enables accurate disease classification with microarrays. J. Mol. Diagnostics12 680–686.
[39] Xie, Y. (2013). Dynamic Documents with R and Knitr.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.