×

Predictive model assessment for count data. (English) Zbl 1180.62162

Summary: We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for count data. Our proposals include a nonrandomized version of the probability integral transform, marginal calibration diagrams, and proper scoring rules, such as the predictive deviance. In case studies, we critique count regression models for patent data, and assess the predictive performance of Bayesian age-period-cohort models for larynx cancer counts in Germany. The toolbox applies in Bayesian or classical and parametric or nonparametric settings and to any type of ordered discrete outcomes.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference

Software:

R; reldist
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Baker, Bayesian projections: What are the effects of excluding data from younger age groups?, American Journal of Epidemiology 162 pp 798– (2005) · doi:10.1093/aje/kwi273
[2] Besag, Bayesian computation and stochastic systems (with discussion), Statistical Science 10 pp 3– (1995) · Zbl 0955.62552 · doi:10.1214/ss/1177010123
[3] Bröcker, Scoring probabilistic forecasts: The importance of being proper, Weather and Forecasting 22 pp 382– (2007) · doi:10.1175/WAF966.1
[4] Brockwell, Universal residuals: A multivariate transformation, Statistics and Probability Letters 77 pp 1473– (2007) · Zbl 1128.62064 · doi:10.1016/j.spl.2007.02.008
[5] Candille, Evaluation of probabilistic prediction systems of a scalar variable, Quarterly Journal of the Royal Meteorological Society 131 pp 2131– (2005) · doi:10.1256/qj.04.71
[6] Carroll, Spatial modeling of snow water equivalent using covariances estimated from spatial and geomorphic attributes, Journal of Hydrology 190 pp 42– (1997) · doi:10.1016/S0022-1694(96)03062-4
[7] Christensen, Bayesian prediction of spatial count data using generalized linear mixed models, Biometrics 58 pp 280– (2002) · Zbl 1209.62156 · doi:10.1111/j.0006-341X.2002.00280.x
[8] Clements, Evaluating Econometric Forecasts of Economic and Financial Variables (2005) · doi:10.1057/9780230596146
[9] Clements, Lung cancer rate predictions using generalized additive models, Biostatistics 6 pp 576– (2005) · Zbl 1169.62369 · doi:10.1093/biostatistics/kxi028
[10] Clements, Re: Bayesian projections: What are the effects of excluding data from younger age groups?, American Journal of Epidemiology 164 pp 292– (2006) · doi:10.1093/aje/kwj221
[11] Czado, Zero-inflated generalized Poisson models with regression effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates, Statistical Modelling 7 pp 125– (2007) · doi:10.1177/1471082X0700700202
[12] Dawid, Statistical theory: The prequential approach, Journal of the Royal Statistical Society, Series A, General 147 pp 278– (1984) · Zbl 0557.62080 · doi:10.2307/2981683
[13] Dawid, Coherent dispersion criteria for optimal experimental design, Annals of Statistics 27 pp 65– (1999) · Zbl 0948.62057
[14] Diebold, Evaluating density forecasts with applications to financial risk management, International Economic Review 39 pp 863– (1998) · doi:10.2307/2527342
[15] Elsner, Prediction models for annual U.S. hurricane counts, Journal of Climate 19 pp 2935– (2006) · doi:10.1175/JCLI3729.1
[16] Epstein, A scoring system for probability forecasts of ranked categories, Journal of Applied Meteorology 8 pp 985– (1969) · doi:10.1175/1520-0450(1969)008<0985:ASSFPF>2.0.CO;2
[17] Frühwirth-Schnatter, Recursive residuals and model diagnostics for normal and non-normal state space models, Environmental and Ecological Statistics 3 pp 291– (1996) · doi:10.1007/BF00539368
[18] Frühwirth-Schnatter, Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling, Biometrika 93 pp 827– (2006) · Zbl 1436.62421 · doi:10.1093/biomet/93.4.827
[19] Frühwirth-Schnatter, Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data, Statistics and Computing (2009) · doi:10.1007/s11222-008-9109-4
[20] Gelfand, Model choice: A minimum posterior predictive loss approach, Biometrika 85 pp 1– (1998) · Zbl 0904.62036 · doi:10.1093/biomet/85.1.1
[21] Gneiting, Strictly proper scoring rules, prediction, and estimation, Journal of the American Statistical Association 102 pp 359– (2007) · Zbl 1284.62093 · doi:10.1198/016214506000001437
[22] Gneiting, Probabilistic forecasts, calibration and sharpness, Journal of the Royal Statistical Society, Series B, Statistical Methodology 69 pp 243– (2007) · Zbl 1120.62074 · doi:10.1111/j.1467-9868.2007.00587.x
[23] Good, Rational decisions, Journal of the Royal Statistical Society, Series B, Methodological 14 pp 107– (1952)
[24] Gotway, Spatial prediction of counts and rates, Statistics in Medicine 22 pp 1415– (2003) · doi:10.1002/sim.1523
[25] Grammig, A new marked point process model for the federal funds rate target: Methodology and forecast evaluation, Journal of Economic Dynamics and Control 32 pp 2370– (2008) · Zbl 1181.91344 · doi:10.1016/j.jedc.2008.02.007
[26] Gschlößl, Spatial modelling of claim frequency and claim size in non-life insurance, Scandinavian Actuarial Journal 107 pp 202– (2007) · Zbl 1150.91026 · doi:10.1080/03461230701414764
[27] Gschlößl, Modelling count data with overdispersion and spatial effects, Statistical Papers 49 pp 531– (2008) · Zbl 1310.62083 · doi:10.1007/s00362-006-0031-6
[28] Hamill, Interpretation of rank histograms for verifying ensemble forecasts, Monthly Weather Review 129 pp 550– (2001) · doi:10.1175/1520-0493(2001)129<0550:IORHFV>2.0.CO;2
[29] Handcock, Relative Distribution Methods in the Social Sciences (1999)
[30] Ihaka, R: A language for data analysis and graphics, Journal of Computational and Graphical Statistics 5 pp 299– (1996) · doi:10.2307/1390807
[31] Jolliffe, Uncertainty and inference for verification measures, Weather and Forecasting 22 pp 637– (2007) · doi:10.1175/WAF989.1
[32] Jolliffe, Forecast Verification: A Practicioner’s Guide in Atmospheric Science (2003)
[33] Knorr-Held, Projections of lung cancer mortality in West Germany: A case study in Bayesian prediction, Biostatistics 2 pp 109– (2001) · doi:10.1093/biostatistics/2.1.109
[34] Lawless, Negative binomial and mixed Poisson regression, Canadian Journal of Statistics 15 pp 209– (1987) · Zbl 0632.62060 · doi:10.2307/3314912
[35] Liesenfeld, Modeling financial transaction price movements: A dynamic integer count data model, Empirical Economics 30 pp 795– (2006) · doi:10.1007/s00181-005-0001-1
[36] McCabe, Bayesian predictions of low count time series, International Journal of Forecasting 21 pp 315– (2005) · doi:10.1016/j.ijforecast.2004.11.001
[37] McCullagh, Generalized Linear Models (1989) · Zbl 0588.62104 · doi:10.1007/978-1-4899-3242-6
[38] Nelson, Statistical models for autocorrelated count data, Statistics in Medicine 25 pp 1413– (2006) · doi:10.1002/sim.2274
[39] O’Hagan, Highly Structured Stochastic Systems pp 423– (2003)
[40] Pepe, The Statistical Evaluation of Medical Tests for Classification and Prediction (2003) · Zbl 1039.62105
[41] Rubin, Bayesianly justifiable and relevant frequency calculations for the applied statistician, Annals of Statistics 12 pp 1151– (1984) · Zbl 0555.62010 · doi:10.1214/aos/1176346785
[42] Smith, Diagnostic checks of non-standard time series models, Journal of Forecasting 4 pp 283– (1985) · doi:10.1002/for.3980040305
[43] Spiegelhalter, Bayesian measures of model complexity and fit (with discussion), Journal of the Royal Statistical Society, Series B, Statistical Methodology 64 pp 583– (2002) · Zbl 1067.62010 · doi:10.1111/1467-9868.00353
[44] Stone, An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion, Journal of the Royal Statistical Society, Series B, Methodological 39 pp 44– (1977) · Zbl 0355.62002
[45] Vuong, Likelihood tests for model selection and non-nested hypotheses, Econometrica 57 pp 307– (1989) · Zbl 0701.62106 · doi:10.2307/1912557
[46] Wang, Analysis of patent data-a mixed-Poisson-regression-model approach, Journal of Business and Economic Statistics 16 pp 27– (1998) · doi:10.2307/1392013
[47] Wecker, Assessing the accuracy of time series model forecasts of count observations, Journal of Business and Economic Statistics 7 pp 418– (1989) · doi:10.2307/1391640
[48] Winkelmann, Econometric Analysis of Count Data (2008) · Zbl 1032.62108
[49] Winkler, Scoring rules and the evaluation of probabilities, Test 5 pp 1– (1996) · Zbl 0848.62001 · doi:10.1007/BF02562681
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.