Ruffieux, Hélène; Davison, Anthony C.; Hager, Jörg; Inshaw, Jamie; Fairfax, Benjamin P.; Richardson, Sylvia; Bottolo, Leonardo A global-local approach for detecting hotspots in multiple-response regression. (English) Zbl 1446.62288 Ann. Appl. Stat. 14, No. 2, 905-928 (2020). Summary: We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, that is, predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression of many genes and may initiate decisive functional mechanisms underlying disease endpoints. Existing hierarchical regression approaches designed to model hotspots suffer from two limitations: their discrimination of hotspots is sensitive to the choice of top-level scale parameters for the propensity of predictors to be hotspots, and they do not scale to large predictor and response vectors, for example, of dimensions \(10^3\)–\(10^5\) in genetic applications. We address these shortcomings by introducing a flexible hierarchical regression framework that is tailored to the detection of hotspots and scalable to the above dimensions. Our proposal implements a fully Bayesian model for hotspots based on the horseshoe shrinkage prior. Its global-local formulation shrinks noise globally and, hence, accommodates the highly sparse nature of genetic analyses while being robust to individual signals, thus leaving the effects of hotspots unshrunk. Inference is carried out using a fast variational algorithm coupled with a novel simulated annealing procedure that allows efficient exploration of multimodal distributions. Cited in 1 Document MSC: 62P10 Applications of statistics to biology and medical sciences; meta analysis 62F07 Statistical ranking and selection procedures 62J15 Paired and multiple comparisons; multiple testing 92D20 Protein sequences, DNA sequences Keywords:annealed variational inference; hierarchical model; horseshoe prior; molecular quantitative trait locus analyses; multiplicity control; normal scale mixture; regulation hotspot; shrinkage; statistical genetics; variable selection Software:EMVS; TreeQTL; Matrix eQTL PDF BibTeX XML Cite \textit{H. Ruffieux} et al., Ann. Appl. Stat. 14, No. 2, 905--928 (2020; Zbl 1446.62288) Full Text: DOI arXiv Euclid References: [1] Alquier, P. and Ridgway, J. (2017). Concentration of tempered posteriors and of their variational approximations. Preprint. Available at arXiv:1706.09293. [2] Alquier, P., Ridgway, J. and Chopin, N. (2016). On the properties of variational approximations of Gibbs posteriors. J. Mach. Learn. Res. 17 Art. ID 239. · Zbl 1437.62129 [3] Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870-897. · Zbl 1092.62033 [4] Bardenet, R., Doucet, A. and Holmes, C. (2014). Towards scaling up Markov chain Monte Carlo: An adaptive subsampling approach. In International Conference on Machine Learning (ICML) 405-413. [5] Berger, J. (1980). A robust generalized Bayes estimator and confidence region for a multivariate normal mean. Ann. Statist. 8 716-761. · Zbl 0464.62026 [6] Bhadra, A. and Mallick, B. K. (2013). Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis. Biometrics 69 447-457. · Zbl 1274.62722 [7] Bhadra, A., Datta, J., Polson, N. G. and Willard, B. (2016). Default Bayesian analysis with global-local shrinkage priors. Biometrika 103 955-969. · Zbl 1506.62343 [8] Bhattacharya, A. and Dunson, D. B. (2010). Nonparametric Bayesian density estimation on manifolds with applications to planar shapes. Biometrika 97 851-865. · Zbl 1204.62053 [9] Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. J. Amer. Statist. Assoc. 112 859-877. [10] Bottolo, L., Petretto, E., Blankenberg, S., Cambien, F., Cook, S. A., Tiret, L. and Richardson, S. (2011). Bayesian detection of expression quantitative trait loci hot spots. Genetics 189 1449-1459. [11] Brynedal, B., Choi, J., Raj, T., Bjornson, R., Stranger, B. E., Neale, B. M., Voight, B. F. and Cotsapas, C. (2017). Large-scale trans-eQTLs affect hundreds of transcripts and mediate patterns of transcriptional co-regulation. Am. J. Hum. Genet. 100 581-591. [12] Carbonetto, P. and Stephens, M. (2012). Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 7 73-107. · Zbl 1330.62089 [13] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465-480. · Zbl 1406.62021 [14] Datta, J. and Ghosh, J. K. (2013). Asymptotic properties of Bayes risk for the horseshoe prior. Bayesian Anal. 8 111-131. · Zbl 1329.62122 [15] Fairfax, B. P., Makino, S., Radhakrishnan, J., Plant, K., Leslie, S., Dilthey, A., Ellis, P., Langford, C., Vannberg, F. O. et al. (2012). Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles. Nat. Genet. 44 502-510. [16] Fairfax, B. P., Humburg, P., Makino, S., Naranbhai, V., Wong, D., Lau, E., Jostins, L., Plant, K., Andrews, R. et al. (2014). Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science 343 Art. ID 1246949. [17] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1 515-533. · Zbl 1331.62139 [18] Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2 1360-1383. · Zbl 1156.62017 [19] George, E. I. (2000). The variable selection problem. J. Amer. Statist. Assoc. 95 1304-1308. · Zbl 1018.62050 [20] Ghosh, P., Tang, X., Ghosh, M. and Chakrabarti, A. (2016). Asymptotic properties of Bayes risk of a general class of shrinkage priors in multiple hypothesis testing under sparsity. Bayesian Anal. 11 753-796. · Zbl 1359.62309 [21] Gilad, Y., Rifkin, S. A. and Pritchard, J. K. (2008). Revealing the architecture of gene regulation: The promise of eQTL studies. Trends Genet. 24 408-415. [22] Gramacy, R., Samworth, R. and King, R. (2010). Importance tempering. Stat. Comput. 20 1-7. [23] Guhaniyogi, R., Qamar, S. and Dunson, D. B. (2018). Bayesian conditional density filtering. J. Comput. Graph. Statist. 27 657-672. [24] Jia, Z. and Xu, S. (2007). Mapping quantitative trait loci for expression abundance. Genetics 176 611-623. [25] Katahira, K., Watanabe, K. and Okada, M. (2008). Deterministic annealing variant of variational Bayes method. J. Phys., Conf. Ser. 95 Art. ID 012015. [26] Kim, S., Becker, J., Bechheim, M., Kaiser, V., Noursadeghi, M., Fricker, N., Beier, E., Klaschik, S., Boor, P. et al. (2014). Characterizing the genetic basis of innate immune response in TLR4-activated human monocytes. Nat. Commun. 5 Art. ID 5236. [27] Kirkpatrick, S., Gelatt, C. D. Jr. and Vecchi, M. P. (1983). Optimization by simulated annealing. Science 220 671-680. · Zbl 1225.90162 [28] Lee, M. N., Ye, C., Villani, A.-C., Raj, T., Li, W., Eisenhaure, T. M., Imboywa, S. H., Chipendo, P. I., Ran, F. A. et al. (2014). Common genetic variants modulate pathogen-sensing responses in human dendritic cells. Science 343 Art. ID 1246980. [29] Lewin, A., Saadi, H., Peters, J. E., Moreno-Moral, A., Lee, J. C., Smith, K. G. C., Petretto, E., Bottolo, L. and Richardson, S. (2015). MT-HESS: An efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues. Bioinformatics 32 523-532. [30] Li, H. and Pati, D. (2017). Variable selection using shrinkage priors. Comput. Statist. Data Anal. 107 107-119. · Zbl 1466.62135 [31] Mackay, T. F. C., Stone, E. A. and Ayroles, J. F. (2009). The genetics of quantitative traits: Challenges and prospects. Nat. Rev. Genet. 10 565-577. [32] Mandt, S., McInerney, J., Abrol, F., Ranganath, R. and Blei, D. (2016). Variational tempering. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research (PMLR) 51 704-712. [33] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21 1087-1092. · Zbl 1431.65006 [34] Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83 1023-1036. · Zbl 0673.62051 [35] Neville, S. E., Ormerod, J. T. and Wand, M. P. (2014). Mean field variational Bayes for continuous sparse signal shrinkage: Pitfalls and remedies. Electron. J. Stat. 8 1113-1151. · Zbl 1298.62050 [36] Nica, A. C. and Dermitzakis, E. T. (2013). Expression quantitative trait loci: Present and future. Philos. Trans. R. Soc. B 368 Art. ID 20120362. [37] O’Brien, S. M. and Dunson, D. B. (2004). Bayesian multivariate logistic regression. Biometrics 60 739-746. · Zbl 1274.62375 [38] Opper, M. and Saad, D., eds. (2001). Advanced Mean Field Methods: Theory and Practice. Neural Information Processing Series. MIT Press, Cambridge, MA. · Zbl 0994.68172 [39] Park, J.-H., Gail, M. H., Weinberg, C. R., Carroll, R. J., Chung, C. C., Wang, Z., Chanock, S. J., Fraumeni, J. F. and Chatterjee, N. (2011). Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. Proc. Natl. Acad. Sci. USA 108 18026-18031. [40] Peterson, C. B., Bogomolov, M., Benjamini, Y. and Sabatti, C. (2016). TreeQTL: Hierarchical error control for eQTL findings. Bioinformatics 32 2556-2558. [41] Petretto, E., Bottolo, L., Langley, S. R., Heinig, M., McDermott-Roe, C., Sarwar, R., Pravenec, M., Hübner, N., Aitman, T. J. et al. (2010). New insights into the genetic control of gene expression using a Bayesian multi-tissue approach. PLoS Comput. Biol. 6 Art. ID e1000737. [42] Piironen, J. and Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electron. J. Stat. 11 5018-5051. · Zbl 1459.62141 [43] Polson, N. G. and Scott, J. G. (2011). Shrink globally, act locally: Sparse Bayesian regularization and prediction. In Bayesian Statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 501-538. Oxford Univ. Press, Oxford. [44] Richardson, S., Bottolo, L. and Rosenthal, J. S. (2011). Bayesian models for sparse regression analysis of high dimensional data. In Bayesian Statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 539-568. Oxford Univ. Press, Oxford. [45] Ročková, V. and George, E. I. (2014). EMVS: The EM approach to Bayesian variable selection. J. Amer. Statist. Assoc. 109 828-846. · Zbl 1367.62049 [46] Rose, K., Gurewitz, E. and Fox, G. (1990). A deterministic annealing approach to clustering. Pattern Recogn. Lett. 11 589-594. · Zbl 0800.68817 [47] Rotival, M., Zeller, T., Wild, P. S., Maouche, S., Szymczak, S., Schillert, A., Castagné, R., Deiseroth, A., Proust, C. et al. (2011). Integrating genome-wide genetic variations and monocyte expression data reveals trans-regulated gene modules in humans. PLoS Genet. 7 Art. ID e1002367. [48] Ruffieux, H., Davison, A. C., Hager, J. and Irincheeva, I. (2017). Efficient inference for genetic association studies with multiple outcomes. Biostatistics 18 618-636. [49] Ruffieux, H., Davison, A. C., Hager, J., Inshaw, J., Fairfax, B., Richardson, S. and Bottolo, L. (2020). Supplement to “A global-local approach for detecting hotspots in multiple-response regression”. https://doi.org/10.1214/20-AOAS1332SUPPA, https://doi.org/10.1214/20-AOAS1332SUPPB [50] Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 2587-2619. · Zbl 1200.62020 [51] Shabalin, A. A. (2012). Matrix eQTL: Ultra fast eQTL analysis via large matrix operations. Bioinformatics 28 1353-1358. [52] Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. and Smoller, J. W. (2013). Pleiotropy in complex traits: Challenges and strategies. Nat. Rev. Genet. 14 483-495. [53] Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Ann. Math. Stat. 42 385-388. · Zbl 0222.62006 [54] Tak, Y. G. and Farnham, P. J. (2015). Making sense of GWAS: Using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenet. Chromatin 8 Art. ID 57. [55] Ueda, N. and Nakano, R. (1998). Deterministic annealing EM algorithm. Neural Netw. 11 271-282. [56] van de Wiel, M. A., Te Beest, D. E. and Münch, M. M. (2019). Learning from a lot: Empirical Bayes for high-dimensional model-based prediction. Scand. J. Stat. 46 2-25. · Zbl 1417.62018 [57] van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: Posterior concentration around nearly black vectors. Electron. J. Stat. 8 2585-2618. · Zbl 1309.62060 [58] van der Pas, S. L., Salomond, J.-B. and Schmidt-Hieber, J. (2016). Conditions for posterior contraction in the sparse normal means problem. Electron. J. Stat. 10 976-1000. · Zbl 1343.62012 [59] van der Pas, S., Szabó, B. and van der Vaart, A. (2017). Adaptive posterior contraction rates for the horseshoe. Electron. J. Stat. 11 3196-3225. · Zbl 1373.62140 [60] van der Pas, S., Szabó, B. and van der Vaart, A. (2016). How many needles in the haystack? Adaptive inference and uncertainty quantification for the horseshoe. Preprint. Available at arXiv:1607.01892. · Zbl 1384.62155 [61] Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1 1-305. · Zbl 1193.62107 [62] Wang, X. and Dunson, D. B. (2013). Parallelizing MCMC via Weierstrass sampler. Preprint. Available at arXiv:1312.4605. [63] Ward, L. D. and Kellis, M. (2012). Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30 1095-1106. [64] Westra, H.-J., Peters, M. J., Esko, T., Yaghootkar, H., Schurmann, C., Kettunen, J., Christiansen, M. W., Fairfax, B. P., Schramm, K. et al. (2013). Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45 1238-1243. [65] Yang, Y., Pati, D. and Bhattacharya, A. (2017). Alpha-variational inference with statistical guarantees. Preprint. Available at arXiv:1710.03266. [66] Yao, C., Joehanes, R., Johnson, A. D., Huan, T., Liu, C., Freedman, J. E., Munson, P. J., Hill, D. E., Vidal, M. et al. (2017). Dynamic role of trans regulation of gene expression in relation to complex traits. Am. J. Hum. Genet. 100 571-580. [67] Yin, J. · Zbl 1234.62151 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.