Robust Gaussian graphical modeling via \(l_{1}\) penalization.

*(English)*Zbl 1259.62102Summary: Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose an \(l_{1}\) penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified-likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re-estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso.

##### MSC:

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

92C40 | Biochemistry, molecular biology |

05C90 | Applications of graph theory |

92D10 | Genetics and epigenetics |

65C60 | Computational problems in statistics (MSC2010) |

##### Keywords:

coordinate descent algorithm; genetic networks; iterative proportional fitting; outliers; penalized likelihood
PDF
BibTeX
Cite

\textit{H. Sun} and \textit{H. Li}, Biometrics 68, No. 4, 1197--1206 (2012; Zbl 1259.62102)

Full Text:
DOI

##### References:

[1] | Basu, Robust and efficient estimation by minimizing a density power divergence, Biometrika 85, pp 549– (1998) · Zbl 0926.62021 |

[2] | Brem, The landscape of genetic complexity across 5700 gene expression traits in yeast, Proceedings of Natioanl Academy of Sciences 102, pp 1572– (2005) |

[3] | Cai, A constrained minimization approach to sparse precision matrix estimation, Journal of American Statistical Association 106 pp 594– (2011) · Zbl 1232.62087 |

[4] | Daye, High-dimensional heteroscedastic regression with an application to eqtl data analysis, Biometrics 68 pp 316– (2012) · Zbl 1241.62152 |

[5] | Finegold, Robust graphical modeling of gene networks using classical and alternative t-distributions, Annals of Applied Statistics 5, pp 1057– (2011) · Zbl 1232.62083 |

[6] | Friedman, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9, pp 432– (2008) · Zbl 1143.62076 |

[7] | Huber, Robust Statistics (1981) · Zbl 0536.62025 |

[8] | James, Sparse regulatory networks, Annals of Applied Statistics 4, pp 663– (2010) · Zbl 1194.62116 |

[9] | Kanehisa, Kegg for representation and analysis of molecular networks involving diseases and drugs, Nucleic Acids Res 38, pp D335– (2010) · Zbl 05891956 |

[10] | Li, Gradient directed regularization for sparse gaussian concentration graphs with applications to inference of genetic networks, Biostatistics 7, pp 302– (2006) · Zbl 1169.62378 |

[11] | Meinshausen, High-dimensional graphs and variable selection with the lasso, Annals of Statistics 34, pp 1436– (2006) · Zbl 1113.62082 |

[12] | Miyamura, Robust gaussian graphical modeling, Journal of Multivariate Analysis 97, pp 1525– (2006) · Zbl 1093.62038 |

[13] | Peng, Partial correlation estimation by joint sparse regression models, Journal of the American Statistical Association 104, pp 735– (2009) · Zbl 1388.62046 |

[14] | Segal, From signatures to models: Understanding cancer using microarrays, Nature Genetics 37, pp S38– (2005) |

[15] | Speed, Gaussian markov distributions over finite graphs, Annals of Statistics 14, pp 138– (1986) · Zbl 0589.62033 |

[16] | Tseng, A coordinate gradient descent method for nonsmooth separable minimization, Mathematical Programming Series B 117, pp 387– (2009) · Zbl 1166.90016 |

[17] | Whittaker, Graphical Models in Applied Multivariate Analysis (1990) · Zbl 0732.62056 |

[18] | Windham, Robustifying model fitting, Journal of Royal Statistical Society B 57, pp 599– (1995) · Zbl 0827.62030 |

[19] | Yuan, Model selection and estimation in the Gaussian graphical model, Biometrika 94, pp 19– (2007) · Zbl 1142.62408 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.