×

On the use of Cauchy prior distributions for Bayesian logistic regression. (English) Zbl 1407.62276

Summary: In logistic regression, separation occurs when a linear combination of the predictors can perfectly classify part or all of the observations in the sample, and as a result, finite maximum likelihood estimates of the regression coefficients do not exist. A. Gelman et al. [Ann. Appl. Stat. 2, No. 4, 1360–1383 (2008; Zbl 1156.62017)] recommended independent Cauchy distributions as default priors for the regression coefficients in logistic regression, even in the case of separation, and reported posterior modes in their analyses. As the mean does not exist for the Cauchy prior, a natural question is whether the posterior means of the regression coefficients exist under separation. We prove theorems that provide necessary and sufficient conditions for the existence of posterior means under independent Cauchy priors for the logit link and a general family of link functions, including the probit link. We also study the existence of posterior means under multivariate Cauchy priors. For full Bayesian inference, we develop a Gibbs sampler based on Pólya-Gamma data augmentation to sample from the posterior distribution under independent Student-\(t\) priors including Cauchy priors, and provide a companion R package {tglm}, available at CRAN. We demonstrate empirically that even when the posterior means of the regression coefficients exist under separation, the magnitude of the posterior samples for Cauchy priors may be unusually large, and the corresponding Gibbs sampler shows extremely slow mixing. While alternative algorithms such as the No-U-Turn Sampler (NUTS) in Stan can greatly improve mixing, in order to resolve the issue of extremely heavy tailed posteriors for Cauchy priors under separation, one would need to consider lighter tailed priors such as normal priors or Student-\(t\) priors with degrees of freedom larger than one.

MSC:

62J12 Generalized linear models (logistic models)
62F15 Bayesian inference
62P10 Applications of statistics to biology and medical sciences; meta analysis

Citations:

Zbl 1156.62017
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Albert, A. and Anderson, J. A. (1984). “On the Existence of Maximum Likelihood Estimates in Logistic Regression Models.” Biometrika, 71(1): 1–10. · Zbl 0543.62020
[2] Bardenet, R., Doucet, A., and Holmes, C. (2014). “Towards Scaling up Markov Chain Monte Carlo: An Adaptive Subsampling Approach.” Proceedings of the 31st International Conference on Machine Learning (ICML-14), 405–413.
[3] Bickel, P. J. and Doksum, K. A. (2001). Mathematical Statistics, volume I. Prentice Hall Englewood Cliffs, NJ. · Zbl 1380.62002
[4] Blattenberger, G. and Lad, F. (1985). “Separating the Brier Score into Calibration and Refinement Components: A Graphical Exposition.” The American Statistician, 39(1): 26–32.
[5] Brier, G. W. (1950). “Verification of Forecasts Expressed in Terms of Probability.” Monthly Weather Review, 78: 1–3.
[6] Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, A., Michael, Guo, J., Li, P., and Riddell, A. (2016). “Stan: A Probabilistic Programming Language.” Journal of Statistical Software, in press.
[7] Casella, G. and Berger, R. L. (1990). Statistical Inference. Duxbury Press. · Zbl 0699.62001
[8] Chen, M.-H. and Shao, Q.-M. (2001). “Propriety of Posterior Distribution for Dichotomous Quantal Response Models.” Proceedings of the American Mathematical Society, 129(1): 293–302. · Zbl 1008.62027
[9] Choi, H. M. and Hobert, J. P. (2013). “The Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic.” Electronic Journal of Statistics, 7(2054–2064). · Zbl 1349.60123
[10] Chopin, N. and Ridgway, J. (2015). “Leave Pima Indians Alone: Binary Regression as a Benchmark for Bayesian Computation.” arxiv.org. · Zbl 1442.62007
[11] Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B., and Weidman, L. (1991). “Multiple Imputation of Industry and Occupation Codes in Census Public-Use Samples Using Bayesian Logistic Regression.” Journal of the American Statistical Association, 86(413): 68–78.
[12] Dawid, A. P. (1973). “Posterior Expectations for Large Observations.” Biometrika, 60: 664–666. · Zbl 0268.62014
[13] Fernández, C. and Steel, M. F. (2000). “Bayesian Regression Analysis with Scale Mixtures of Normals.” Econometric Theory, 16(80–101). · Zbl 0945.62031
[14] Firth, D. (1993). “Bias Reduction of Maximum Likelihood Estimates.” Biometrika, 80(1): 27–38. · Zbl 0769.62021
[15] Fouskakis, D., Ntzoufras, I., and Draper, D. (2009). “Bayesian Variable Selection Using Cost-Adjusted BIC, with Application to Cost-Effective Measurement of Quality of Health Care.” The Annals of Applied Statistics, 3(2): 663–690. · Zbl 1166.62082
[16] Gelman, A., Jakulin, A., Pittau, M., and Su, Y. (2008). “A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models.” The Annals of Applied Statistics, 2(4): 1360–1383. · Zbl 1156.62017
[17] Gelman, A., Su, Y.-S., Yajima, M., Hill, J., Pittau, M. G., Kerman, J., Zheng, T., and Dorie, V. (2015). arm: Data Analysis Using Regression and Multilevel/Hierarchical Models. R package version 1.8-5. URL http://CRAN.R-project.org/package=arm
[18] Ghosh, J. and Clyde, M. A. (2011). “Rao-Blackwellization for Bayesian Variable Selection and Model Averaging in Linear and Binary Regression: A Novel Data Augmentation Approach.” Journal of the American Statistical Association, 106(495): 1041–1052. · Zbl 1229.62029
[19] Ghosh, J., Herring, A. H., and Siega-Riz, A. M. (2011). “Bayesian Variable Selection for Latent Class Models.” Biometrics, 67: 917–925. · Zbl 1226.62022
[20] Ghosh, J., Li, Y., and Mitra, R. (2017). “Supplementary Material for “On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression”.” Bayesian Analysis.
[21] Ghosh, J. and Reiter, J. P. (2013). “Secure Bayesian Model Averaging for Horizontally Partitioned Data.” Statistics and Computing, 23: 311–322. · Zbl 1322.62101
[22] Gramacy, R. B. and Polson, N. G. (2012). “Simulation-Based Regularized Logistic Regression.” Bayesian Analysis, 7(3): 567–590. · Zbl 1330.62301
[23] Hanson, T. E., Branscum, A. J., and Johnson, W. O. (2014). “Informative g-Priors for Logistic Regression.” Bayesian Analysis, 9(3): 597–612. · Zbl 1327.62395
[24] Heinze, G. (2006). “A Comparative Investigation of Methods for Logistic Regression with Separated or Nearly Separated Data.” Statistics in Medicine, 25: 4216–4226.
[25] Heinze, G. and Schemper, M. (2002). “A Solution to the Problem of Separation in Logistic Regression.” Statistics in Medicine, 21: 2409–2419.
[26] Hoffman, M. D. and Gelman, A. (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” The Journal of Machine Learning Research, 15(1): 1593–1623. · Zbl 1319.60150
[27] Holmes, C. C. and Held, L. (2006). “Bayesian Auxiliary Variable Models for Binary and Multinomial Regression.” Bayesian Analysis, 1(1): 145–168. · Zbl 1331.62142
[28] Ibrahim, J. G. and Laud, P. W. (1991). “On Bayesian Analysis of Generalized Linear Models using Jeffreys’s Prior.” Journal of the American Statistical Association, 86(416): 981–986. · Zbl 0850.62292
[29] Jeffreys, H. (1961). Theory of Probability. Oxford Univ. Press. · Zbl 0116.34904
[30] Kurgan, L., Cios, K., Tadeusiewicz, R., Ogiela, M., and Goodenday, L. (2001). “Knowledge Discovery Approach to Automated Cardiac SPECT Diagnosis.” Artificial Intelligence in Medicine, 23:2: 149–169.
[31] Li, Y. and Clyde, M. A. (2015). “Mixtures of \(g\)-Priors in Generalized Linear Models.” arxiv.org.
[32] Liu, C. (2004). “Robit Regression: A Simple Robust Alternative to Logistic and Probit Regression.” In Gelman, A. and Meng, X. (eds.), Applied Bayesian Modeling and Casual Inference from Incomplete-Data Perspectives, 227–238. Wiley, London. · Zbl 05274820
[33] McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman and Hall. · Zbl 0744.62098
[34] Mitra, R. and Dunson, D. B. (2010). “Two Level Stochastic Search Variable Selection in GLMs with Missing Predictors.” International Journal of Biostatistics, 6(1): Article 33.
[35] Neal, R. M. (2003). “Slice Samlping.” The Annals of Statistics, 31(3): 705–767. · Zbl 1051.65007
[36] Neal, R. M. (2011). “MCMC using Hamiltonian Dynamics.” In Brooks, S., Gelman, A., Jones, G., and Meng, X.-L. (eds.), Handbook of Markov Chain Monte Carlo. Chapman & Hall / CRC Press.
[37] O’Brien, S. M. and Dunson, D. B. (2004). “Bayesian Multivariate Logistic Regression.” Biometrics, 60(3): 739–746. · Zbl 1274.62375
[38] Polson, N. G., Scott, J. G., and Windle, J. (2013). “Bayesian Inference for Logistic Models Using Pólya-Gamma Latent Variables.” Journal of the American Statistical Association, 108(504): 1339–1349. · Zbl 1283.62055
[39] Rousseeuw, P. J. and Christmann, A. (2003). “Robustness Against Separation and Outliers in Logistic Regression.” Computational Statistics and Data Analysis, 42: 315–332. · Zbl 1429.62325
[40] Sabanés Bové, D. and Held, L. (2011). “Hyper-\(g\) Priors for Generalized Linear Models.” Bayesian Analysis, 6(3): 387–410. · Zbl 1330.62058
[41] Speckman, P. L., Lee, J., and Sun, D. (2009). “Existence of the MLE and Propriety of Posteriors for a General Multinomial Choice Model.” Statistica Sinica, 19: 731–748. · Zbl 1168.62019
[42] Yang, R. and Berger, J. O. (1996). “A Catalog of Noninformative Priors.” Institute of Statistics and Decision Sciences, Duke University.
[43] Zellner, A. and Siow, A. (1980). “Posterior Odds Ratios for Selected Regression Hypotheses.” In Bayesian Statistics: Proceedings of the First International Meeting Held in Valencia (Spain), 585–603. Valencia, Spain: University of Valencia Press. · Zbl 0457.62004
[44] Zorn, C. (2005). “A Solution to Separation in Binary Response Models.” Political Analysis, 13(2): 157–170.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.