×

Understanding predictive information criteria for Bayesian models. (English) Zbl 1332.62090

Summary: We review the Akaike, deviance, and Watanabe-Akaike information criteria from a Bayesian perspective, where the goal is to estimate expected out-of-sample-prediction error using a bias-corrected adjustment of within-sample error. We focus on the choices involved in setting up these measures, and we compare them in three simple examples, one theoretical and two applied. The contribution of this paper is to put all these information criteria into a Bayesian predictive context and to better understand, through small examples, how these methods can apply in practice.

MSC:

62F15 Bayesian inference
62B10 Statistical aspects of information-theoretic topics

Software:

BUGS; BayesDA; bootstrap
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aitkin, M.: Statistical Inference: an Integrated Bayesian/Likelihood Approach. Chapman & Hall, London (2010) · Zbl 1267.62040 · doi:10.1201/EBK1420093438
[2] Akaike, H.; Petrov, B. N. (ed.); Csaki, F. (ed.), Information theory and an extension of the maximum likelihood principle, 267-281 (1973), Budapest
[3] Ando, T., Tsay, R.: Predictive likelihood for Bayesian model selection and averaging. Int. J. Forecast. 26, 744-763 (2010) · doi:10.1016/j.ijforecast.2009.08.001
[4] Bernardo, J.M.: Expected information as expected utility. Ann. Stat. 7, 686-690 (1979) · Zbl 0407.62002 · doi:10.1214/aos/1176344689
[5] Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, New York (1994) · Zbl 0796.62002 · doi:10.1002/9780470316870
[6] Burman, P.: A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika 76, 503-514 (1989) · Zbl 0677.62065 · doi:10.1093/biomet/76.3.503
[7] Burman, P., Chow, E., Nolan, D.: A cross-validatory method for dependent data. Biometrika 81, 351-358 (1994) · Zbl 0825.62669 · doi:10.1093/biomet/81.2.351
[8] Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: a Practical Information Theoretic Approach. Springer, New York (2002) · Zbl 1005.62007
[9] Celeux, G., Forbes, F., Robert, C., Titterington, D.: Deviance information criteria for missing data models. Bayesian Anal. 1, 651-706 (2006) · Zbl 1331.62329 · doi:10.1214/06-BA122
[10] DeGroot, M.H.: Optimal Statistical Decisions. McGraw-Hill, New York (1970) · Zbl 0225.62006
[11] Dempster, A. P., The direct use of likelihood for significance testing, Department of Theoretical Statistics: University of Aarhus · Zbl 0367.62004
[12] Draper, D.: Model uncertainty yes, discrete model averaging maybe. Stat. Sci. 14, 405-409 (1999)
[13] Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993) · Zbl 0835.62038 · doi:10.1007/978-1-4899-4541-9
[14] Geisser, S., Eddy, W.: A predictive approach to model selection. J. Am. Stat. Assoc. 74, 153-160 (1979) · Zbl 0401.62036 · doi:10.1080/01621459.1979.10481632
[15] Gelfand, A., Dey, D.: Bayesian model choice: asymptotics and exact calculations. J. R. Stat. Soc. B 56, 501-514 (1994) · Zbl 0800.62170
[16] Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. CRC Press, London (2003)
[17] Gelman, A., Meng, X.L., Stern, H.S.: Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Stat. Sin. 6, 733-807 (1996) · Zbl 0859.62028
[18] Gelman, A., Shalizi, C.: Philosophy and the practice of Bayesian statistics (with discussion). Br. J. Math. Stat. Psychol. 66, 8-80 (2013) · Zbl 1410.62009 · doi:10.1111/j.2044-8317.2011.02037.x
[19] Gneiting, T.: Making and evaluating point forecasts. J. Am. Stat. Assoc. 106, 746-762 (2011) · Zbl 1232.62028 · doi:10.1198/jasa.2011.r10138
[20] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359-378 (2007) · Zbl 1284.62093 · doi:10.1198/016214506000001437
[21] Hibbs, D.: Implications of the ‘bread and peace’ model for the 2008 U.S. presidential election. Public Choice 137, 1-10 (2008) · doi:10.1007/s11127-008-9333-7
[22] Hoeting, J., Madigan, D., Raftery, A.E., Volinsky, C.: Bayesian model averaging (with discussion). Stat. Sci. 14, 382-417 (1999) · Zbl 1059.62525 · doi:10.1214/ss/1009212519
[23] Jones, H.E., Spiegelhalter, D.J.: Improved probabilistic prediction of healthcare performance indicators using bidirectional smoothing models. J. R. Stat. Soc. A 175, 729-747 (2012) · doi:10.1111/j.1467-985X.2011.01019.x
[24] McCulloch, R.E.: Local model influence. J. Am. Stat. Assoc. 84, 473-478 (1989) · doi:10.1080/01621459.1989.10478793
[25] Plummer, M.: Penalized loss functions for Bayesian model comparison. Biostatistics 9, 523-539 (2008) · Zbl 1143.62003 · doi:10.1093/biostatistics/kxm049
[26] Ripley, B.D.: Statistical Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996) · Zbl 0853.62046 · doi:10.1017/CBO9780511812651
[27] Robert, C.P.: Intrinsic losses. Theory Decis. 40, 191-214 (1996) · Zbl 0848.90010 · doi:10.1007/BF00133173
[28] Rubin, D.B.: Estimation in parallel randomized experiments. J. Educ. Stat. 6, 377-401 (1981)
[29] Rubin, D.B.: Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 12, 1151-1172 (1984) · Zbl 0555.62010 · doi:10.1214/aos/1176346785
[30] Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6, 461-464 (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[31] Shibata, R.; Willems, J. C. (ed.), Statistical aspects of model selection, 215-240 (1989), Berlin · doi:10.1007/978-3-642-75007-6_5
[32] Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit (with discussion). J. R. Stat. Soc. B (2002) · Zbl 1067.62010
[33] Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., Lunn, D.: BUGS: Bayesian inference using Gibbs sampling. MRC Biostatistics Unit, Cambridge, England (1994, 2003). http://www.mrc-bsu.cam.ac.uk/bugs/
[34] Stone, M.: An asymptotic equivalence of choice of model cross-validation and Akaike’s criterion. J. R. Stat. Soc. B 36, 44-47 (1977) · Zbl 0355.62002
[35] van der Linde, A.: DIC in variable selection. Stat. Neerl. 1, 45-56 (2005) · Zbl 1069.62005 · doi:10.1111/j.1467-9574.2005.00278.x
[36] Vehtari, A., Lampinen, J.: Bayesian model assessment and comparison using cross-validation predictive densities. Neural Comput. 14, 2439-2468 (2002) · Zbl 1002.62029 · doi:10.1162/08997660260293292
[37] Vehtari, A., Ojanen, J.: A survey of Bayesian predictive methods for model assessment, selection and comparison. Stat. Surv. 6, 142-228 (2012) · Zbl 1302.62011 · doi:10.1214/12-SS102
[38] Watanabe, S.: Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, Cambridge (2009) · Zbl 1180.93108 · doi:10.1017/CBO9780511800474
[39] Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571-3594 (2010) · Zbl 1242.62024
[40] Watanabe, S.: A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 14, 867-897 (2013) · Zbl 1320.62058
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.