Bayesian network marker selection via the thresholded graph Laplacian Gaussian prior. (English) Zbl 1437.62291

Summary: Selecting informative nodes over large-scale networks becomes increasingly important in many research areas. Most existing methods focus on the local network structure and incur heavy computational costs for the large-scale problem. In this work, we propose a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework: the Thresholded Graph Laplacian Gaussian (TGLG) prior, which adopts the graph Laplacian matrix to characterize the conditional dependence between neighboring markers accounting for the global network structure. Under mild conditions, we show the proposed model enjoys the posterior consistency with a diverging number of edges and nodes in the network. We also develop a Metropolis-adjusted Langevin algorithm (MALA) for efficient posterior computation, which is scalable to large-scale networks. We illustrate the superiorities of the proposed method compared with existing alternatives via extensive simulation studies and an analysis of the breast cancer gene expression dataset in the Cancer Genome Atlas (TCGA).


62J12 Generalized linear models (logistic models)
62H22 Probabilistic graphical models
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI arXiv Euclid


[1] Aebersold, R. and Mann, M. (2003). “Mass spectrometry-based proteomics.” Nature, 422(6928): 198.
[2] Barabási, A.-L. and Albert, R. (1999). “Emergence of scaling in random networks.” Science, 286(5439): 509-512. · Zbl 1226.05223
[3] Barabási, A.-L., Gulbahce, N., and Loscalzo, J. (2011). “Network medicine: a network-based approach to human disease.” Nature reviews genetics, 12(1): 56.
[4] Barbieri, M. M., Berger, J. O., et al. (2004). “Optimal predictive model selection.” The annals of statistics, 32(3): 870-897. · Zbl 1092.62033
[5] Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). “Dirichlet-Laplace priors for optimal shrinkage.” Journal of the American Statistical Association, 110(512): 1479-1490. · Zbl 1373.62368
[6] Burger, R., Bakker, F., Guenther, A., Baum, W., Schmidt-Arras, D., Hideshima, T., Tai, Y.-T., Shringarpure, R., Catley, L., Senaldi, G., Gramatzki, M., and Anderson, K. C. (2003). “Functional significance of novel neurotrophin-1/B cell-stimulating factor-3 (cardiotrophin-like cytokine) for human myeloma cell growth and survival.” British Journal of Haematology, 123(5): 869-78.
[7] Cai, Q., Kang, J., and Yu, T. (2018a). “Supplementary File 1 for “Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior”.” Bayesian Analysis.
[8] Cai, Q., Kang, J., and Yu, T. (2018b). “Supplementary file 2 for “Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior”.” Bayesian Analysis.
[9] Caldon, C. E. (2014). “Estrogen signaling and the DNA damage response in hormone dependent breast cancers.” Frontiers in Oncology, 4: 106.
[10] Chang, C., Kundu, S., and Long, Q. (2016). “Scalable Bayesian variable selection for structured high-dimensional data.” arXiv preprint arXiv:1604.07264.
[11] Chekouo, T., Stingo, F. C., Guindani, M., Do, K.-A., et al. (2016). “A Bayesian predictive model for imaging genetics with application to schizophrenia.” The Annals of Applied Statistics, 10(3): 1547-1571. · Zbl 1391.62205
[12] Chung, F. R. (1997). Spectral graph theory, volume 92. American Mathematical Society. · Zbl 0867.05046
[13] Ciruelos Gil, E. M. (2014). “Targeting the PI3K/AKT/mTOR pathway in estrogen receptor-positive breast cancer.” Cancer Treatment Reviews, 40(7): 862-71.
[14] Clauset, A., Newman, M. E., and Moore, C. (2004). “Finding community structure in very large networks.” Physical review E, 70(6): 066111.
[15] Das, J. and Yu, H. (2012). “HINT: High-quality protein interactomes and their applications in understanding human disease.” BMC Systems Biology, 6: 92.
[16] Dobra, A. (2009). “Variable selection and dependency networks for genomewide data.” Biostatistics, 10(4): 621-639.
[17] Doi, K. (2007). “Computer-aided diagnosis in medical imaging: historical review, current status and future potential.” Computerized medical imaging and graphics, 31(4-5): 198-211.
[18] Falcon, S. and Gentleman, R. (2007). “Using GOstats to test gene lists for GO term association.” Bioinformatics, 23(2): 257-8.
[19] Fan, J. and Li, R. (2001). “Variable selection via nonconcave penalized likelihood and its oracle properties.” Journal of the American statistical Association, 96(456): 1348-1360. · Zbl 1073.62547
[20] Fang, Z. and Luna, E. J. (2013). “Supervillin-mediated suppression of p53 protein enhances cell survival.” Journal of Biological Chemistry, 288(11): 7918-29.
[21] Formosa, R., Borg, J., and Vassallo, J. (2017). “Aryl hydrocarbon receptor (AHR) is a potential tumour suppressor in pituitary adenomas.” Endocrine Related Cancer, 24(8): 445-457.
[22] George, E. I. and McCulloch, R. E. (1993). “Variable selection via Gibbs sampling.” Journal of the American Statistical Association, 88(423): 881-889.
[23] Gilkes, D. M. and Semenza, G. L. (2013). “Role of hypoxia-inducible factors in breast cancer metastasis.” Future Oncology, 9(11): 1623-36.
[24] Goldsmith, J., Huang, L., and Crainiceanu, C. M. (2014). “Smooth scalar-on-image regression via spatial Bayesian variable selection.” Journal of Computational and Graphical Statistics, 23(1): 46-64.
[25] Greicius, M. D., Krasnow, B., Reiss, A. L., and Menon, V. (2003). “Functional connectivity in the resting brain: a network analysis of the default mode hypothesis.” Proceedings of the National Academy of Sciences, 100(1): 253-258.
[26] Hopcroft, J. and Tarjan, R. (1973). “Algorithm 447: efficient algorithms for graph manipulation.” Communications of the ACM, 16(6): 372-378.
[27] Jiang, W. (2007). “Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities.” The Annals of Statistics, 35(4): 1487-1511. · Zbl 1123.62026
[28] Jin, S.-S. and Song, W.-J. (2017). “Association between MDR1 C3435T polymorphism and colorectal cancer risk: A meta-analysis.” Medicine (Baltimore), 96(51): e9428.
[29] Johnson, V. E. and Rossell, D. (2012). “Bayesian model selection in high-dimensional settings.” Journal of the American Statistical Association, 107(498): 649-660. · Zbl 1261.62024
[30] Kang, J., Reich, B. J., and Staicu, A.-M. (2018). “Scalar-on-image regression via the soft-thresholded Gaussian process.” Biometrika, 105(1): 165-184. · Zbl 07072406
[31] Kim, J., Gao, L., and Tan, K. (2012). “Multi-analyte network markers for tumor prognosis.” PLoS One, 7(12): e52973.
[32] Kim, S., Pan, W., and Shen, X. (2013). “Network-based penalized regression with application to genomic data.” Biometrics, 69(3): 582-593. · Zbl 1429.62294
[33] Kitano, H. (2002). “Systems biology: a brief overview.” Science, 295(5560): 1662-1664.
[34] Kovats, S. (2015). “Estrogen receptors regulate innate immune cells and signaling pathways.” Cellular Immunology, 294(2): 63-9.
[35] Krausz, L. T., Fischer-Fodor, E., Major, Z. Z., and Fetica, B. (2012). “GITR-expressing regulatory T-cell subsets are increased in tumor-positive lymph nodes from advanced breast cancer patients as compared to tumor-negative lymph nodes.” International Journal of Immunopathology and Pharmacology, 25(1): 59-66.
[36] Kundu, S., Shin, M., Cheng, Y., Manyam, G., Mallick, B. K., and Baladandayuthapani, V. (2015). “Bayesian Variable Selection with Structure Learning: Applications in Integrative Genomics.” arXiv preprint arXiv:1508.02803.
[37] Le Rhun, E., Bertrand, N., Dumont, A., Tresch, E., Le Deley, M.-C., Mailliez, A., Preusser, M., Weller, M., Revillion, F., and Bonneterre, J. (2017). “Identification of single nucleotide polymorphisms of the PI3K-AKT-mTOR pathway as a risk factor of central nervous system metastasis in metastatic breast cancer.” European Journal of Cancer, 87: 189-198.
[38] Leu, Y.-W., Yan, P. S., Fan, M., Jin, V. X., Liu, J. C., Curran, E. M., Welshons, W. V., Wei, S. H., Davuluri, R. V., Plass, C., Nephew, K. P., and Huang, T. H.-M. (2004). “Loss of estrogen receptor signaling triggers epigenetic silencing of downstream targets in breast cancer.” Cancer Research, 64(22): 8184-92.
[39] Li, C. and Li, H. (2008). “Network-constrained regularization and variable selection for analysis of genomic data.” Bioinformatics, 24(9): 1175-1182.
[40] Li, C. and Li, H. (2010). “Variable selection and regression analysis for graph-structured covariates with an application to genomics.” The annals of applied statistics, 4(3): 1498. · Zbl 1202.62157
[41] Li, F. and Zhang, N. R. (2010). “Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces With Applications in Genomics.” Journal of the American Statistical Association, 105(491): 1202-1214. · Zbl 1390.62027
[42] Li, F., Zhang, T., Wang, Q., Gonzalez, M. Z., Maresh, E. L., Coan, J. A., et al. (2015). “Spatial Bayesian variable selection and grouping for high-dimensional scalar-on-image regression.” The Annals of Applied Statistics, 9(2): 687-713. · Zbl 1397.62458
[43] Li, Y.-X., Yu, Z.-W., Jiang, T., Shao, L.-W., Liu, Y., Li, N., Wu, Y.-F., Zheng, C., Wu, X.-Y., Zhang, M., Zheng, D.-F., Qi, X.-L., Ding, M., Zhang, J., and Chang, Q. (2018). “SNCA, a novel biomarker for Group 4 medulloblastomas, can inhibit tumor invasion and induce apoptosis.” Cancer Science, 109(4): 1263-1275.
[44] Liu, F., Chakraborty, S., Li, F., Liu, Y., Lozano, A. C., et al. (2014). “Bayesian regularization via graph Laplacian.” Bayesian Analysis, 9(2): 449-474. · Zbl 1327.62152
[45] Liu, X., Chen, L., Ge, J., Yan, C., Huang, Z., Hu, J., Wen, C., Li, M., Huang, D., Qiu, Y., Hao, H., Yuan, R., Lei, J., Yu, X., and Shao, J. (2016). “The Ubiquitin-like Protein FAT10 Stabilizes eEF1A1 Expression to Promote Tumor Proliferation in a Complex Manner.” Cancer Research, 76(16): 4897-907.
[46] Lopez, S. M., Agoulnik, A. I., Zhang, M., Peterson, L. E., Suarez, E., Gandarillas, G. A., Frolov, A., Li, R., Rajapakshe, K., Coarfa, C., Ittmann, M. M., Weigel, N. L., and Agoulnik, I. U. (2016). “Nuclear Receptor Corepressor 1 Expression and Output Declines with Prostate Cancer Progression.” Clinical Cancer Research, 22(15): 3937-49.
[47] Luo, C., Pan, W., and Shen, X. (2012). “A two-step penalized regression method with networked predictors.” Statistics in biosciences, 4(1): 27-46.
[48] Matthews, J. and Gustafsson, J.-A. (2006). “Estrogen receptor and aryl hydrocarbon receptor signaling pathways.” Nuclear Receptor Signaling, 4: e016.
[49] Nakajima, J. and West, M. (2013a). “Bayesian analysis of latent threshold dynamic models.” Journal of Business & Economic Statistics, 31(2): 151-164.
[50] Nakajima, J. and West, M. (2013b). “Bayesian dynamic factor models: Latent threshold approach.” Journal of Financial Econometrics, 11: 116-153.
[51] Nakajima, J., West, M., et al. (2017). “Dynamics & sparsity in latent threshold factor models: A study in multivariate EEG signal processing.” Brazilian Journal of Probability and Statistics, 31(4): 701-731. · Zbl 1385.62025
[52] Ni, Y., Stingo, F. C., and Baladandayuthapani, V. (2017). “Bayesian graphical regression.” Journal of the American Statistical Association, (just-accepted). · Zbl 1418.62088
[53] Osborne, C. K., Shou, J., Massarweh, S., and Schiff, R. (2005). “Crosstalk between estrogen receptor and growth factor receptor pathways as a cause for endocrine therapy resistance in breast cancer.” Clinical Cancer Research, 11(2 Pt 2): 865s-70s.
[54] Pan, W., Xie, B., and Shen, X. (2010). “Incorporating predictor network in penalized regression with application to microarray data.” Biometrics, 66(2): 474-484. · Zbl 1192.62235
[55] Park, T. and Casella, G. (2008). “The Bayesian Lasso.” Journal of the American Statistical Association, 103(482): 681-686. · Zbl 1330.62292
[56] Peng, B., Zhu, D., Ander, B. P., Zhang, X., Xue, F., Sharp, F. R., and Yang, X. (2013). “An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.” PloS one, 8(7): e67672.
[57] Peng, S., Eidelberg, D., and Ma, Y. (2014). “Brain network markers of abnormal cerebral glucose metabolism and blood flow in Parkinson?s disease.” Neuroscience bulletin, 30(5): 823-837.
[58] Peterson, C. B., Stingo, F. C., and Vannucci, M. (2016). “Joint Bayesian variable and graph selection for regression models with network-structured predictors.” Statistics in medicine, 35(7): 1017-1031.
[59] Polson, N. G. and Scott, J. G. (2012). “Local shrinkage rules, Lévy processes and regularized regression.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(2): 287-311. · Zbl 1411.62209
[60] Roberts, G. O., Gelman, A., Gilks, W. R., et al. (1997). “Weak convergence and optimal scaling of random walk Metropolis algorithms.” The annals of applied probability, 7(1): 110-120. · Zbl 0876.60015
[61] Roberts, G. O. and Rosenthal, J. S. (1998). “Optimal scaling of discrete approximations to Langevin diffusions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1): 255-268. · Zbl 0913.60060
[62] Roberts, G. O., Rosenthal, J. S., et al. (2001). “Optimal scaling for various Metropolis-Hastings algorithms.” Statistical science, 16(4): 351-367. · Zbl 1127.65305
[63] Schaer, D. A., Murphy, J. T., and Wolchok, J. D. (2012). “Modulation of GITR for cancer immunotherapy.” Current Opinion in Immunology, 24(2): 217-24.
[64] Schuster, S. C. (2007). “Next-generation sequencing transforms today’s biology.” Nature methods, 5(1): 16.
[65] Shi, R. and Kang, J. (2015). “Thresholded multiscale Gaussian processes with application to Bayesian feature selection for massive neuroimaging data.” arXiv preprint arXiv:1504.06074.
[66] Song, Q. and Liang, F. (2015). “A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(5): 947-972. · Zbl 1414.62322
[67] Stingo, F. C., Chen, Y. A., Tadesse, M. G., and Vannucci, M. (2011). “Incorporating biological information into linear models: A Bayesian approach to the selection of pathways and genes.” The annals of applied statistics, 5(3). · Zbl 1228.62150
[68] Stubelius, A., Erlandsson, M. C., Islander, U., and Carlsten, H. (2014). “Immunomodulation by the estrogen metabolite 2-methoxyestradiol.” Clinical Immunology, 153(1): 40-8.
[69] Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society. Series B (Methodological), 267-288. · Zbl 0850.62538
[70] Wolff, M., Kosyna, F. K., Dunst, J., Jelkmann, W., and Depping, R. (2017). “Impact of hypoxia inducible factors on estrogen receptor expression in breast cancer cells.” Archives of Biochemistry and Biophysics, 613: 23-30.
[71] Wu, S., Mao, L., Li, Y., Yin, Y., Yuan, W., Chen, Y., Ren, W., Lu, X., Li, Y., Chen, L., Chen, B., Xu, W., Tian, T., Lu, Y., Jiang, L., Zhuang, X., Chu, M., and Wu, J. (2018). “RAGE may act as a tumour suppressor to regulate lung cancer development.” Gene, 651: 86-93.
[72] Yin, J., Zhang, Z., Zheng, H., and Xu, L. (2017). “IRS-2 rs1805097 polymorphism is associated with the decreased risk of colorectal cancer.” Oncotarget, 8(15): 25107-25114.
[73] Yuan, X., Chen, J., Lin, Y., Li, Y., Xu, L., Chen, L., Hua, H., and Shen, B. (2017). “Network biomarkers constructed from gene expression and protein-protein interaction data for accurate prediction of Leukemia.” Journal of Cancer, 8(2): 278.
[74] Zhang, C.-H. (2010). “Nearly unbiased variable selection under minimax concave penalty.” The Annals of statistics, 894-942. · Zbl 1183.62120
[75] Zhang, Y., Jiang, C., Li, H., Lv, F., Li, X., Qian, X., Fu, L., Xu, B., and Guo, X. (2015). “Elevated Aurora B expression contributes to chemoresistance and poor prognosis in breast cancer.” International Journal of Clinical and Experimental Pathology, 8(1): 751-7.
[76] Zhe, S., Naqvi, S. A., Yang, Y., and Qi, Y. (2013). “Joint network and node selection for pathway-based genomic data analysis.” Bioinformatics, 29(16): 1987-1996.
[77] Zhou, H. and Zheng, T. (2013). “Bayesian hierarchical graph-structured model for pathway analysis using gene expression data.” Statistical applications in genetics and molecular biology, 12(3): 393-412.
[78] Zou, H. (2006). “The adaptive lasso and its oracle properties.” Journal of the American statistical association, 101(476): 1418-1429. · Zbl 1171.62326
[79] Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.