Graphical modeling for gene set analysis: a critical appraisal. (English) Zbl 1334.62010

Summary: Current demand for understanding the behavior of groups of related genes, combined with the greater availability of data, has led to an increased focus on statistical methods in gene set analysis. In this paper, we aim to perform a critical appraisal of the methodology based on graphical models developed in [M. S. Massa, M. Chiogna and C. Romualdi, “Gene set analysis exploiting the topology of a pathway”, BMC Syst. Biol. 4, Paper No. 121 (2010; doi:10.1186/1752-0509-4-121)] that uses pathway signaling networks as a starting point to develop statistically sound procedures for gene set analysis. We pay attention to the potential of the methodology with respect to the organizational aspects of dealing with such complex but highly informative starting structures, that is pathways. We focus on three themes: the translation of a biological pathway into a graph suitable for modeling, the role of shrinkage when more genes than samples are obtained, the evaluation of respondence of the statistical models to the biological expectations. To study the impact of shrinkage, two simulation studies will be run. To evaluate the biological expectation we will use data from a network with known behavior that offer the possibility of carrying out a realistic check of respondence of the model to changes in the experimental conditions.


62A09 Graphical methods in statistics
62-07 Data analysis (statistics) (MSC2010)
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D10 Genetics and epigenetics
Full Text: DOI


[1] Bader, Pathguide: a pathway resource list, Nucleic Acids Research 34 pp D504– (2006) · Zbl 05437860
[2] Chiaretti, Gene expression profiles of B-lineage adult acute lymphocytic leukemia reveal genetic patterns that identify lineage derivation and distinct mechanisms of transformation, Clinical Cancer Research 11 pp 7209– (2005)
[3] Dempster, Covariance selection, Biometrics 28 pp 157– (1972)
[4] Dudoit, Multiple hypothesis testing in microarray experiments, Statistical Science pp 71– (2003) · Zbl 1048.62099
[5] Edwards, Network-enabled gene expression analysis, BMC Bioinformatics 13 pp 167– (2012)
[6] Friedman, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 pp 432– (2007) · Zbl 1143.62076
[7] Goeman, A global test for groups of genes: testing association with a clinical outcome, Bioinformatics 20 pp 93– (2004)
[8] Jacob , L. 2012 NCIgraph: Pathways from the NCI Pathways Database
[9] Janssen, Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens-Fisher problem, Statistics and Probability Letters 36 pp 9– (1997) · Zbl 1064.62526
[10] Joshi-Tope, Reactome: a knowledgebase of biological pathways, Nucleic Acids Research 33 pp D428– (2005) · Zbl 05437316
[11] Kanehisa, KEGG: Kyoto Encyclopedia of Genes and Genomes, Nucleic Acids Research 28 pp 27– (2000) · Zbl 05435931
[12] Lauritzen, Graphical Models (1996) · Zbl 0907.62001
[13] Mansmann, Testing differential gene expression in functional groups. Goeman’s global test versus an ANCOVA approach, Methods of Information in Medicine 44 pp 449– (2005)
[14] Martini , P. Sales , G. Romualdi , C. 2013 Clipper: Gene Set Analysis Exploiting Pathway Topology
[15] Massa, Gene sets analysis exploiting the topology of a pathway, BMC System Biology 4 pp 121– (2010)
[16] Massa , M. S. Sales , G. 2013 TopologyGSA: Gene Set Analysis Exploiting Pathway Topology
[17] Mitrea, Methods and approaches in the topology-based analysis of biological pathways, Frontiers in Physiology 4 pp 278– (2013)
[18] Nishimura, Biocarta, Biotech Software & Internet Report: The Computer Software Journal for Scient 2 pp 117– (2001)
[19] Pellegrini, Expression profile of CREB knockdown in myeloid leukemia cells, BMC Cancer 8 pp 264– (2008)
[20] Sadeghi, Markov properties for mixed graphs, Bernoulli 20 pp 676– (2014) · Zbl 1303.60064
[21] Sales, Graphite-a bioconductor package to convert pathway topology to gene network, BMC Bioinformatics 13 pp 20– (2012)
[22] Schaefer, PID: the Pathway Interaction Database, Nucleic Acids Research 37 pp D674– (2009) · Zbl 05746610
[23] Schafer, A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics, Statistical Applications in Genetics and Molecular Biology 4 pp 32– (2005)
[24] Speed, Gaussian Markov distributions over finite graphs, The Annals of Statistics 14 pp 138– (1986) · Zbl 0589.62033
[25] Tsai, Multivariate analysis of variance test for gene set analysis, Bioinformatics 25 pp 897– (2009) · Zbl 05743845
[26] Vastrik, Reactome: a knowledge base of biologic pathways and processes, Genome Biology 8 pp R39– (2007)
[27] Wang, Exact distribution of the MLE of concentration matrices in decomposable covariance selection models, Statistica Sinica 11 pp 855– (2001) · Zbl 1013.62011
[28] Zhang, KEGG graph: a graph approach to KEGG pathway in R and bioconductor, Bioinformatics 25 pp 1470– (2009) · Zbl 05743982
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.