zbMATH — the first resource for mathematics

Sparse regulatory networks. (English) Zbl 1194.62116
Summary: In many organisms the expression levels of each gene are controlled by the activation levels of known “Transcription Factors” (TF). A problem of considerable interest is that of estimating the “Transcription Regulation Networks” (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates that there will be very few connections between TFs and genes.
Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses \(L_{1}\) penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.

62P10 Applications of statistics to biology and medical sciences; meta analysis
92D10 Genetics and epigenetics
65C60 Computational problems in statistics (MSC2010)
92C40 Biochemistry, molecular biology
Full Text: DOI
[1] Alter, O., Brown, P. and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. 97 10101-10106.
[2] Anderson, T. (1984). An Introduction to Multivariate Statistical Analysis . Wiley, New York. · Zbl 0651.62041
[3] Beal, M., Falciani, F., Ghahramani, Z., Rangel, C. and Wild, D. (2005). A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics 21 349-356.
[4] Boulesteix, A. and Strimmer, K. (2005). Predicting transcription factor activities from combined analysis of microarray and chip data: A partial least squares approach. Theor. Biol. Med. Model 2 23.
[5] Brynildsen, M., Tran, L. and Liao, J. (2006). A Gibbs sampler for the identification of gene expression and network connectivity consistency. Bioinformatics 22 3040-3046.
[6] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523 · euclid:aos/1201012958
[7] Chang, C., Ding, Z., Hung, Y. and Fung, P. (2008). Fast network component analysis (fastNCA) for gene regulatory network reconstruction from microarray data. Bioinformatics 24 1349-1358.
[8] Courcelle, J., Khodursky, A., Peter, B., Brown, P. O. and Hanawalt, P. C. (2001). Comparative gene expression profiles following UV exposure in wild-type and SOS-deficient Escherichia coli. Genetics 158 41-64.
[9] Efron, B., Hastie, T., Johnston, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407-451. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[10] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[11] Friedman, J., Hastie, T., Hofling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 1 302-332. · Zbl 1378.90064 · doi:10.1214/07-AOAS131
[12] Fu, W. (1998). Penalized regressions: The Bridge versus the Lasso. J. Comput. Graph. Statist. 7 397-416.
[13] James, G. M. and Radchenko, P. (2009). A generalized Dantzig selector with shrinkage tuning. Biometrika 96 323-337. · Zbl 1163.62054 · doi:10.1093/biomet/asp013
[14] Khodursky, A. B., Peter, B. J., Cozzarelli, N. R., Botstein, D., Brown, P. O. and Yanofsky, C. (2000). DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. Proc. Natl. Acad. Sci. USA 97 12170-12175.
[15] Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401 788-793. · Zbl 1055.81054 · doi:10.1142/S0217732304015300
[16] Lee, D. D. and Seung, H. S. (2001). Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13 556-562.
[17] Lee, S. and Batzoglou, S. (2003). Application of independent component analysis to microarrays. Genome Biol. 4 76.
[18] Li, Z., Shaw, S., Yedwabnick, M. and Chan, C. (2006). Using a state-space model with hidden variables to infer transcription factor activities. Bioinformatics 22 747-754. · Zbl 1022.68519 · link.springer.de
[19] Liao, J. C., Boscolo, R., Yang, Y., Tran, L., Sabatti, C. and Roychowdhury, V. (2003). Network component analysis: Reconstruction of regulatory signals in biological systems. Proc. Natl. Acad. Sci. 100 15522-15527.
[20] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374-393. · Zbl 1452.62522
[21] Meinshausen, N. and Buehlmann, P. (2008). Stability selection. J. Roy. Stat. Soc. Ser. B. To appear. Available at . · arxiv.org
[22] Oh, M. K. and Liao, J. C. (2000a). DNA microarray detection of metabolic responses to protein overproduction in Escherichia coli. Metabolic Engineering 2 201-209.
[23] Oh, M. K. and Liao, J. C. (2000b). Gene expression profiling by dna microarrays and metabolic fluxes in Escherichia coli. Biotechnol. Prog. 16 278-286.
[24] Oh, M. K., Rohlin, L. and Liao, J. C. (2002). Global expression profiling of acetate-grown Escherichia coli. J. Biol. Chem. 277 13175-13183.
[25] Pournara, I. and Wernisch, L. (2007). Factor analysis for gene regulatory networks and transcription factor activity profiles. BMC Bioinformatics 8 .
[26] Radchenko, P. and James, G. M. (2008). Variable inclusion and shrinkage algorithms. J. Amer. Statist. Assoc. 103 1304-1315. · Zbl 1205.62100 · doi:10.1198/016214508000000481
[27] Sabatti, C. and James, G. M. (2006). Bayesian sparse hidden components analysis for transcription regulation networks. Bioinformatics 22 737-744.
[28] Sabatti, C. and Lange, K. (2002). Genomewise motif identification using a dictionary model. IEEE Proceedings 90 1803-1810.
[29] Sanguinetti, G., Lawrence, N. and Rattray, M. (2006). Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities. Bioinformatics 22 2775-2781.
[30] Sun, N., Carroll, R. and Zhao, H. (2006). Bayesian error analysis model for reconstructing transcriptional regulatory networks. Proc. Natl. Acad. Sci. 103 7988-7993.
[31] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[32] Tran, L., Brynildsen, M., Kao, K., Suen, J. and Liao, J. (2005). gNCA: A framework for determining transcription factor activity based on transcriptome: Identifiability and numerical implementation. Metab. Eng. 7 128-141.
[33] West, M. (2003). Bayesian factor regression models in the “large p , small n ” paradigm. Bayesian Statist. 7 733-742.
[34] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515-534.
[35] Yu, T. and Li, K. (2005). Inference of transcriptional regulatory network by two-stage constrained space factor analysis. Bioinformatics 21 4033-4038.
[36] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[37] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.