×

Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. (English) Zbl 1465.62050

Summary: Mendelian randomization (MR) is a method of exploiting genetic variation to unbiasedly estimate a causal effect in presence of unmeasured confounding. MR is being widely used in epidemiology and other related areas of population science. In this paper, we study statistical inference in the increasingly popular two-sample summary-data MR design. We show a linear model for the observed associations approximately holds in a wide variety of settings when all the genetic variants satisfy the exclusion restriction assumption, or in genetic terms, when there is no pleiotropy. In this scenario, we derive a maximum profile likelihood estimator with provable consistency and asymptotic normality. However, through analyzing real datasets, we find strong evidence of both systematic and idiosyncratic pleiotropy in MR, echoing the omnigenic model of complex traits that is recently proposed in genetics. We model the systematic pleiotropy by a random effects model, where no genetic variant satisfies the exclusion restriction condition exactly. In this case, we propose a consistent and asymptotically normal estimator by adjusting the profile score. We then tackle the idiosyncratic pleiotropy by robustifying the adjusted profile score. We demonstrate the robustness and efficiency of the proposed methods using several simulated and real datasets.

MSC:

62F03 Parametric hypothesis testing
62F12 Asymptotic properties of parametric estimators
62F35 Robustness and adaptive procedures (parametric inference)

Software:

PLINK; robustbase
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] 1000 Genomes Project Consortium, Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., Korbel, J. O., Marchini, J. L., McCarthy, S. et al. (2015). A global reference for human genetic variation. Nature 526 68-74.
[2] Anderson, T. W. and Rubin, H. (1949). Estimation of the parameters of a single equation in a complete system of stochastic equations. Ann. Math. Stat. 20 46-63. · Zbl 0033.08002 · doi:10.1214/aoms/1177730090
[3] Baiocchi, M., Cheng, J. and Small, D. S. (2014). Instrumental variable methods for causal inference. Stat. Med. 33 2297-2340.
[4] Barndorff-Nielsen, O. (1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika 70 343-365. · Zbl 0532.62006 · doi:10.1093/biomet/70.2.343
[5] Bekker, P. A. (1994). Alternative approximations to the distributions of instrumental variable estimators. Econometrica 62 657-681. · Zbl 0795.62102 · doi:10.2307/2951662
[6] Bound, J., Jaeger, D. A. and Baker, R. M. (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J. Amer. Statist. Assoc. 90 443-450.
[7] Bowden, J., Davey Smith, G. and Burgess, S. (2015). Mendelian randomization with invalid instruments: Effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44 512-525.
[8] Bowden, J., Davey Smith, G., Haycock, P. C. and Burgess, S. (2016). Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40 304-314.
[9] Bowden, J., Del Greco M., F., Minelli, C., Davey Smith, G., Sheehan, N. and Thompson, J. (2017). A framework for the investigation of pleiotropy in two-sample summary data mendelian randomization. Stat. Med. 36 1783-1802.
[10] Bowden, J., Fabiola Del Greco, M., Minelli, C., Lawlor, D., Sheehan, N., Thompson, J. and Smith, G. D. (2019). Improving the accuracy of two-sample summary-data Mendelian randomization: Moving beyond the NOME assumption. Int. J. Epidemiol. 48 728-742.
[11] Bowden, J., Spiller, W., Del-Greco, F., Sheehan, N., Thompson, J., Minelli, C. and Smith, G. D. (2018). Improving the visualization, interpretation and analysis of two-sample summary data Mendelian randomization via the Radial plot and Radial regression. Int. J. Epidemiol. 47 1264-1278.
[12] Boyle, E. A., Li, Y. I. and Pritchard, J. K. (2017). An expanded view of complex traits: From polygenic to omnigenic. Cell 169 1177-1186.
[13] Burgess, S., Butterworth, A. and Thompson, S. G. (2013). Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37 658-665.
[14] Burgess, S., Scott, R. A., Timpson, N. J., Smith, G. D., Thompson, S. G. and Consortium, E.-I. (2015). Using published data in Mendelian randomization: A blueprint for efficient identification of causal risk factors. Eur. J. Epidemiol. 30 543-552.
[15] Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. Monographs on Statistics and Applied Probability 105. CRC Press/CRC, Boca Raton, FL. · Zbl 1119.62063
[16] Clarivate Analytics (2017). Web of Science Topic: Mendelian Randomization. Available at http://www.webofknowledge.com.
[17] Cox, D. R. and Reid, N. (1987). Parameter orthogonality and approximate conditional inference. J. Roy. Statist. Soc. Ser. B 49 1-39. · Zbl 0616.62006 · doi:10.1111/j.2517-6161.1987.tb01422.x
[18] Davey Smith, G. and Ebrahim, S. (2003). “Mendelian randomization”: Can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32 1-22.
[19] Davey Smith, G. and Hemani, G. (2014). Mendelian randomization: Genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23 R89-R98.
[20] Davey Smith, G., Lawlor, D. A., Harbord, R., Timpson, N., Day, I. and Ebrahim, S. (2007). Clustered environments and randomized genes: A fundamental distinction between conventional and genetic epidemiology. PLoS Med. 4 e352.
[21] Didelez, V. and Sheehan, N. (2007). Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res. 16 309-330. · Zbl 1122.62343 · doi:10.1177/0962280206077743
[22] Fisher, R. A. (1930). The Genetical Theory of Natural Selection. Oxford Univ. Press, Oxford. · JFM 56.1106.13
[23] Guo, Z., Kang, H., Cai, T. T. and Small, D. S. (2018). Confidence intervals for causal effects with invalid instruments by using two-stage hard thresholding with voting. J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 793-815. · Zbl 1398.62114 · doi:10.1111/rssb.12275
[24] Hampel, F. R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69 383-393. · Zbl 0305.62031 · doi:10.1080/01621459.1974.10482962
[25] Hansen, C., Hausman, J. and Newey, W. (2008). Estimation with many instrumental variables. J. Bus. Econom. Statist. 26 398-422.
[26] Hartwig, F. P., Davey Smith, G. and Bowden, J. (2017). Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int. J. Epidemiol. 46 1985-1998.
[27] Haycock, P. C., Burgess, S., Wade, K. H., Bowden, J., Relton, C. and Smith, G. D. (2016). Best (but oft-forgotten) practices: The design, analysis, and interpretation of Mendelian randomization studies. Am. J. Clin. Nutr. 103 965-978.
[28] Hemani, G., Zheng, J., Elsworth, B., Wade, K. H., Haberland, V., Baird, D., Laurin, C., Burgess, S., Bowden, J. et al. (2018). The MR-Base platform supports systematic causal inference across the human phenome. eLife 7 e34408.
[29] Hernán, M. A. and Robins, J. M. (2006). Instruments for causal inference: An epidemiologist’s dream? Epidemiology 17 360-372.
[30] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73-101. · Zbl 0136.39805 · doi:10.1214/aoms/1177703732
[31] Ioannidis, J. P., Trikalinos, T. A. and Khoury, M. J. (2006). Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am. J. Epidemiol. 164 609-614.
[32] Kang, H., Zhang, A., Cai, T. T. and Small, D. S. (2016). Instrumental variables estimation with some invalid instruments and its application to mendelian randomization. J. Amer. Statist. Assoc. 111 132-144.
[33] Katan, M. (1986). Apoupoprotein E isoforms, serum cholesterol, and cancer. Lancet 327 507-508.
[34] Li, S. (2017). Mendelian randomization when many instruments are invalid: Hierarchical empirical Bayes estimation. Available at arXiv:1706.01389.
[35] Locke, A. E., Kahali, B., Berndt, S. I., Justice, A. E., Pers, T. H., Day, F. R., Powell, C., Vedantam, S., Buchkovich, M. L. et al. (2015). Genetic studies of body mass index yield new insights for obesity biology. Nature 518 197-206.
[36] Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006). Robust Statistics: Theory and Methods. Wiley Series in Probability and Statistics. Wiley, Chichester. · Zbl 1094.62040
[37] McCullagh, P. and Tibshirani, R. (1990). A simple method for the adjustment of profile likelihoods. J. Roy. Statist. Soc. Ser. B 52 325-344. · Zbl 0716.62039 · doi:10.1111/j.2517-6161.1990.tb01790.x
[38] Murphy, S. A. and Van Der Vaart, A. W. (1996). Likelihood inference in the errors-in-variables model. J. Multivariate Anal. 59 81-108. · Zbl 0865.62032 · doi:10.1006/jmva.1996.0055
[39] Neyman, J. and Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica 16 1-32. · Zbl 0034.07602 · doi:10.2307/1914288
[40] Pacini, D. and Windmeijer, F. (2016). Robust inference for the two-sample 2SLS estimator. Econom. Lett. 146 50-54. · Zbl 1398.62183 · doi:10.1016/j.econlet.2016.06.033
[41] Park, J.-H., Wacholder, S., Gail, M. H., Peters, U., Jacobs, K. B., Chanock, S. J. and Chatterjee, N. (2010). Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42 570-575.
[42] Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge. · Zbl 1188.68291
[43] Purcell, S. PLINK (software V1.07). http://pngu.mgh.harvard.edu/purcell/plink/.
[44] Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller, J., Sklar, P., De Bakker, P. I. et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81 559-575.
[45] Rothenberg, T. J. (1971). Identification in parametric models. Econometrica 39 577-591. · Zbl 0231.62081 · doi:10.2307/1913267
[46] Shi, H., Kichaev, G. and Pasaniuc, B. (2016). Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 99 139-153.
[47] Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. and Smoller, J. W. (2013). Pleiotropy in complex traits: Challenges and strategies. Nat. Rev. Genet. 14 483-495.
[48] Stearns, F. W. (2010). One hundred years of pleiotropy: A retrospective. Genetics 186 767-773.
[49] Stock, J. H. and Yogo, M. (2005). Asymptotic distributions of instrumental variables statistics with many instruments. In Identification and Inference for Econometric Models 109-120. Cambridge Univ. Press, Cambridge. · Zbl 1119.62015 · doi:10.1017/CBO9780511614491
[50] Tchetgen Tchetgen, E. J., Sun, B. and Walter, S. (2017). The GENIUS approach to robust Mendelian randomization inference. Available at arXiv:1709.07779.
[51] van Kippersluis, H. and Rietveld, C. A. (2017). Pleiotropy-robust Mendelian randomization. Int. J. Epidemiol. In press.
[52] Verbanck, M., Chen, C.-Y., Neale, B. and Do, R. (2018). Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50 693.
[53] Wald, A. (1940). The fitting of straight lines if both variables are subject to error. Ann. Math. Stat. 11 285-300. · JFM 66.0638.03 · doi:10.1214/aoms/1177731868
[54] Wright, P. G. (1928). Tariff on Animal and Vegetable Oils. MacMillan, New York.
[55] Wright, S. (1968). Evolution and the Genetics of Populations, Volume 1: Genetic and Biometric Foundations 1. Univ. Chicago Press, Chicago.
[56] Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. Ann. Statist. 15 642-656. · Zbl 0624.62037 · doi:10.1214/aos/1176350366
[57] Zhao, Q.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.