Bayesian pathway selection. (English) Zbl 07716674

Summary: We propose a Bayesian pathway selection method that allows the selection of pathways (sets of genes) directly related to a continuous response variable under a non-parametric hierarchical model framework. The fact that sets of genes effectively explain more the response variable than individual genes was the driving force behind this research. We utilize the stochastic search variable selection and kernel machine method to select effective pathways after adjusting clinical covariates effects. The selection of pathways simultaneously works compared to other methods, where pathways are analyzed separately. We show that the proposed model can successfully detect effective pathways associated with outcomes through simulation studies and real data application.


62-XX Statistics


Full Text: DOI


[1] Baldi, C.; Cho, S.; Ellis, RE, Mutations in two independent pathways are sufficient to create hermaphroditic nematodes, Science, 326, 5955, 1002-1005 (2009)
[2] Bobb, JF; Claus Henn, B.; Valeri, L., Statistical software for analyzing the health effects of multiple concurrent exposures via bayesian kernel machine regression, Environmental Health, 17, 1, 1-10 (2018)
[3] Breiman, L., Better subset regression using the nonnegative garrote, Technometrics, 37, 4, 373-384 (1995)
[4] Chen, RB; Chu, CH; Yuan, S., Bayesian sparse group selection, Journal of Computational and Graphical Statistics, 25, 3, 665-683 (2016)
[5] Chen, Z.; Dunson, DB, Random effects selection in linear mixed models, Biometrics, 59, 4, 762-769 (2003)
[6] Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press. · Zbl 0994.68074
[7] Ferwerda, J.; Hainmueller, J.; Hazlett, CJ, Kernel-based regularized least squares in r and stata, Journal of Statistical Software, 79, 3, 1-26 (2017)
[8] Friedman, JH, Multivariate adaptive regression splines, The Annals of Statistics, 19, 1-67 (1991)
[9] Friedman, JH; Stuetzle, W., Projection pursuit regression, Journal of the American Statistical Association, 76, 376, 817-823 (1981)
[10] Gelman, A.; Rubin, DB, Inference from iterative simulation using multiple sequences, Statistical Science, 7, 4, 457-472 (1992)
[11] George, EI; McCulloch, RE, Variable selection via gibbs sampling, Journal of the American Statistical Association, 88, 423, 881-889 (1993)
[12] Geyer, CJ, Practical markov chain monte carlo, Statistical Science, 7, 4, 473-483 (1992)
[13] Kim, I.; Pang, H.; Zhao, H., Bayesian semiparametric regression models for evaluating pathway effects on continuous and binary clinical outcomes, Statistics in Medicine, 31, 15, 1633-1651 (2012)
[14] Kim, I.; Pang, H.; Zhao, H., Statistical properties on semiparametric regression for evaluating pathway effects, Journal of Statistical Planning and Inference, 143, 4, 745-763 (2013)
[15] Kimeldorf, G.; Wahba, G., Some results on tchebycheffian spline functions, Journal of Mathematical Analysis and Applications, 33, 1, 82-95 (1971)
[16] Kozak, M.; Kang, M.; Stepien, M., Causal pathways when independent variables are co-related: New interpretational possibilities, Plant Soil and Environment, 53, 6, 267 (2007)
[17] Lazarevic, N.; Knibbs, LD; Sly, PD, Performance of variable and function selection methods for estimating the nonlinear health effects of correlated chemical mixtures: A simulation study, Statistics in Medicine, 39, 27, 3947-3967 (2020)
[18] Lin, Y.; Zhang, HH, Component selection and smoothing in multivariate nonparametric regression, The Annals of Statistics, 34, 5, 2272-2297 (2006)
[19] Liu, D.; Lin, X.; Ghosh, D., Semiparametric regression of multidimensional genetic pathway data: Least-squares kernel machines and linear mixed models, Biometrics, 63, 4, 1079-1088 (2007)
[20] Maity, A.; Lin, X., Powerful tests for detecting a gene effect in the presence of possible gene-gene interactions using garrote kernel machines, Biometrics, 67, 4, 1271-1284 (2011)
[21] Mercer, J., Functions of positive and negative type, and their connection the theory of integral equations, Philosophical Transactions of the Royal Society of London Series A, Containing Papers of a Mathematical or Physical Character, 209, 415-446 (1909) · JFM 40.0408.02
[22] Micchelli, CA; Pontil, M.; Bartlett, P., Learning the kernel function via regularization, Journal of Machine Learning Research, 6, 7, 1099-1125 (2005)
[23] Mootha, VK; Lindgren, CM; Eriksson, KF, Pgc-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes, Nature Genetics, 34, 3, 267-273 (2003)
[24] Pang, H.; Lin, A.; Holford, M., Pathway analysis using random forests classification and regression, Bioinformatics, 22, 16, 2028-2036 (2006)
[25] Pang, H.; Kim, I.; Zhao, H., Random effects model for multiple pathway analysis with applications to type ii diabetes microarray data, Statistics in Biosciences, 7, 2, 167-186 (2015)
[26] Peng, B., Zhu, D., & Ander, B.P. et al. (2013). An integrative framework for bayesian variable selection with informative priors for identifying genes and pathways. PLoS One, 8(7):e67,672
[27] Radchenko, P.; James, GM, Variable selection using adaptive nonlinear interaction structures in high dimensions, Journal of the American Statistical Association, 105, 492, 1541-1553 (2010)
[28] Ravikumar, P.; John Lafferty, HL; Wasserman, L., Sparse additive models, Journal of the Royal Statistical Society, 34, 71, 1009-1030 (2009)
[29] Stingo, FC; Chen, YA; Tadesse, MG, Incorporating biological information into linear models: A bayesian approach to the selection of pathways and genes, The Annals of Applied Statistics, 5, 3, 1978-2002 (2011)
[30] Subramanian, A., Tamayo, P., & Mootha, V.K., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43), 15,545-15,550
[31] Valeri, L., Mazumdar, M.M., & Bobb, J.F. et al. (2017). The joint effect of prenatal exposure to metal mixtures on neurodevelopmental outcomes at 20-40 months of age: Evidence from rural Bangladesh. Environmental Health Perspectives, 125(6), 067,015
[32] Wand, M. P., & Jones, M. C. (1994). Kernel Smoothing. CRC Press. · Zbl 0854.62043
[33] Xu, X.; Ghosh, M., Bayesian variable selection and estimation for group lasso, Bayesian Analysis, 10, 4, 909-936 (2015)
[34] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 2, 301-320 (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.