zbMATH — the first resource for mathematics

A weakly informative default prior distribution for logistic and other regression models. (English) Zbl 1156.62017
Summary: We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-\(t\) prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Cross-validation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors.
We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation. We implement a procedure to fit generalized linear models in R with the Student-\(t\) prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set.

62F15 Bayesian inference
62J12 Generalized linear models (logistic models)
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI
[1] Agresti, A. and Coull, B. A. (1998). Approximate is better than exact for interval estimation of binomial proportions. Amer. Statist. 52 119-126.
[2] Albert, A. and Anderson, J. A. (1984). On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71 1-10. · Zbl 0543.62020 · doi:10.1093/biomet/71.1.1
[3] Asuncion, A. and Newman, D. J. (2007). UCI Machine Learning Repository. Dept. of Information and Computer Sciences, Univ. California, Irvine. Available at www.ics.uci.edu/ mlearn/MLRepository.html.
[4] Bedrick, E. J., Christensen, R. and Johnson, W. (1996). A new perspective on priors for generalized linear models. J. Amer. Statist. Assoc. 91 1450-1460. · Zbl 0882.62057 · doi:10.2307/2291571
[5] Berger, J. O. and Berliner, L. M. (1986). Robust Bayes and empirical Bayes analysis with epsilon-contaminated priors. Ann. Statist. 14 461-486. · Zbl 0602.62004 · doi:10.1214/aos/1176349933
[6] Bernardo, J. M. (1979). Reference posterior distributions for Bayesian inference (with discussion). J. Roy. Statist. Soc. Ser. B 41 113-147. · Zbl 0428.62004
[7] Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review 78 1-3.
[8] Carlin, B. P. and Louis, T. A. (2001). Bayes and Empirical Bayes Methods for Data Analysis , 2nd ed. CRC Press. London. · Zbl 0871.62012
[9] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1-38. · Zbl 0364.62022
[10] Dunson, D. B., Herring, A. H. and Engel, S. M. (2006). Bayesian selection and clustering of polymorphisms in functionally-related genes. J. Amer. Statist. Assoc. · Zbl 05564508 · doi:10.1198/016214507000000554
[11] Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-93 . Morgan Kauffman, Chambery, France.
[12] Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika 80 27-38. · Zbl 0769.62021 · doi:10.1093/biomet/80.1.27
[13] Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statist. Med. · doi:10.1002/sim.3107
[14] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2003). Bayesian Data Analysis , 2nd ed. CRC Press, London. · Zbl 1279.62004
[15] Gelman, A. and Jakulin, A. (2007). Bayes: Liberal, radical, or conservative? Statist. Sinica 17 422-426.
[16] Gelman, A. and Pardoe, I. (2007). Average predictive comparisons for models with nonlinearity, interactions, and variance components. Sociological Methodology .
[17] Gelman, A., Pittau, M. G., Yajima, M. and Su, Y. S. (2008). An approximate EM algorithm for multilevel generalized linear models. Technical report, Dept. of Statistics, Columbia Univ.
[18] Genkin, A., Lewis, D. D. and Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics 49 291-304. · doi:10.1198/004017007000000245
[19] Greenland, S. (2001). Putting background information about relative risks into conjugate prior distributions. Biometrics 57 663-670. · Zbl 1209.62033 · doi:10.1111/j.0006-341X.2001.00663.x
[20] Greenland, S., Schlesselman, J. J. and Criqui, M. H. (2002). The fallacy of employing standardized regression coefficients and correlations as measures of effect. American Journal of Epidemiology 123 203-208.
[21] Hartigan, J. (1964). Invariant prior distributions. Ann. Math. Statist. 35 836-845. · Zbl 0151.23003 · doi:10.1214/aoms/1177703583
[22] Heinze, G. (2006). A comparative investigation of methods for logistic regression with separated or nearly separated data. Statist. Med. 25 4216-4226. · doi:10.1002/sim.2687
[23] Heinze, G. and Schemper, M. (2003). A solution to the problem of separation in logistic regression. Statist. Med. 12 2409-2419.
[24] Jakulin, A. and Bratko, I. (2003). Analyzing attribute dependencies. In Knowledge Discovery in Databases: PKDD 2003 229-240.
[25] Jeffreys, H. (1961). Theory of Probability , 3rd ed. Oxford Univ. Press. · Zbl 0116.34904
[26] Kass, R. E. and Wasserman, L. (1996). The selection of prior distributions by formal rules. J. Amer. Statist. Assoc. 91 1343-1370. · Zbl 0884.62007 · doi:10.2307/2291752
[27] Kosmidis, I. (2007). Bias reduction in exponential family nonlinear models. Ph.D. thesis, Dept. of Statistics, Univ. Warwick, England. · Zbl 1179.62096
[28] Lesaffre, E. and Albert, A. (1989). Partial separation in logistic discrimination. J. Roy. Statist. Soc. Ser. B 51 109-116. · Zbl 0669.62044
[29] Lange, K. L., Little, R. J. A. and Taylor, J. M. G. (1989). Robust statistical modeling using the t distribution. J. Amer. Statist. Assoc. 84 881-896.
[30] Liu, C. (2004). Robit regression: A simple robust alternative to logistic and probit regression. In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives (A. Gelman and X. L. Meng, eds.) 227-238. Wiley, London. · Zbl 05274820
[31] MacLehose, R. F., Dunson, D. B., Herring, A. H. and Hoppin, J. A. (2006). Bayesian methods for highly correlated exposure data. Epidemiology .
[32] Martin, A. D. and Quinn, K. M. (2002). MCMCpack. Available at mcmcpack.wush.edu.
[33] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models , 2nd ed. Chapman and Hall, London. · Zbl 0744.62098
[34] Miller, M. E., Hui, S. L. and Tierney, W. M. (1990). Validation techniques for logistic regression models. Statist. Med. 10 1213-1226.
[35] Newman, D. J., Hettich, S., Blake, C. L. and Merz, C. J. (1998). UCI Repository of machine learning databases. Dept. of Information and Computer Sciences, Univ. California, Irvine.
[36] Racine, A., Grieve, A. P., Fluhler, H. and Smith, A. F. M. (1986). Bayesian methods in practice: Experiences in the pharmaceutical industry (with discussion). Appl. Statist. 35 93-150. · Zbl 0635.62106 · doi:10.2307/2347264
[37] Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83 251-266. · Zbl 0864.62049 · doi:10.1093/biomet/83.2.251 · www3.oup.co.uk
[38] Raghunathan, T. E., Van Hoewyk, J. and Solenberger, P. W. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol. 27 85-95.
[39] Rubin, D. B. (1978). Multiple imputations in sample surveys: A phenomenological Bayesian approach to nonresponse (with discussion). In Proc. Amer. Statist. Assoc., Survey Research Methods Section 20-34.
[40] Rubin, D. B. (1996). Multiple imputation after 18+ years (with discussion). J. Amer. Statist. Assoc. 91 473-520. · Zbl 0869.62014 · doi:10.2307/2291635
[41] Spiegelhalter, D. J. and Smith, A. F. M. (1982). Bayes factors for linear and log-linear models with vague prior information. J. Roy. Statist. Soc. Ser. B 44 377-387. · Zbl 0502.62032
[42] Stigler, S. M. (1977). Do robust estimators work with real data? Ann. Statist. 5 1055-1098. · Zbl 0374.62050 · doi:10.1214/aos/1176343997
[43] Van Buuren, S. and Oudshoom, C. G. M. (2000). MICE: Multivariate imputation by chained equations (S software for missing-data imputation). Available at web.inter.nl.net/users/S.van.Buuren/mi/.
[44] Vilalta, R. and Drissi, Y. (2002). A perspective view and survey of metalearning. Artificial Intelligence Review 18 77-95.
[45] Winkler, R. L. (1969). Scoring rules and the evaluation of probability assessors. J. Amer. Statist. Assoc. 64 1073-1078.
[46] Witte, J. S., Greenland, S. and Kim, L. L. (1998). Software for hierarchical modeling of epidemiologic data. Epidemiology 9 563-566.
[47] Zhang, T. and Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval 4 5-31. · Zbl 1030.68910 · doi:10.1023/A:1011441423217
[48] Yang, R. and Berger, J. O. (1994). Estimation of a covariance matrix using reference prior. Ann. Statist. 22 1195-1211. · Zbl 0819.62013 · doi:10.1214/aos/1176325625
[49] Zorn, C. (2005). A solution to separation in binary response models. Political Analysis 13 157-170.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.