Probability aggregation in time-series: dynamic hierarchical modeling of sparse expert beliefs. (English) Zbl 1429.62706

Summary: Most subjective probability aggregation procedures use a single probability judgment from each expert, even though it is common for experts studying real problems to update their probability estimates over time. This paper advances into unexplored areas of probability aggregation by considering a dynamic context in which experts can update their beliefs at random intervals. The updates occur very infrequently, resulting in a sparse data set that cannot be modeled by standard time-series procedures. In response to the lack of appropriate methodology, this paper presents a hierarchical model that takes into account the expert’s level of self-reported expertise and produces aggregate probabilities that are sharp and well calibrated both in- and out-of-sample. The model is demonstrated on a real-world data set that includes over 2300 experts making multiple probability forecasts over two years on different subsets of 166 international political events.


62P25 Applications of statistics to social sciences
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)


Full Text: DOI arXiv


[1] Allard, D., Comunian, A. and Renard, P. (2012). Probability aggregation methods in geoscience. Math. Geosci. 44 545-581. · Zbl 1256.86006 · doi:10.1007/s11004-012-9396-3
[2] Ariely, D., Au, W. T., Bender, R. H., Budescu, D. V., Dietz, C. B., Gu, H., Wallsten, T. S. and Zauberman, G. (2000). The effects of averaging subjective probability estimates between and within judges. Journal of Experimental Psychology : Applied 6 130-147.
[3] Baars, J. A. and Mass, C. F. (2005). Performance of national weather service forecasts compared to operational, consensus, and weighted model output statistics. Weather and Forecasting 20 1034-1047.
[4] Baron, J., Mellers, B. A., Tetlock, P. E., Stone, E. and Ungar, L. H. (2014). Two reasons to make aggregated probability forecasts more extreme. Decis. Anal. 11 . . · Zbl 1398.90060
[5] Batchelder, W. H., Strashny, A. and Romney, A. K. (2010). Cultural consensus theory: Aggregating continuous responses in a finite interval. In Advances in Social Computing (S.-K. Chaim, J. J. Salerno and P. L. Mabry, eds.) 98-107. Springer, Berlin.
[6] Bier, V. (2004). Implications of the research on expert overconfidence and dependence. Reliability Engineering & System Safety 85 321-329.
[7] Bonferroni, C. E. (1936). Teoria Statistica Delle Classi e Calcolo Delle Probabilitá. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 8 3-62. · Zbl 0016.41103
[8] Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review 78 1-3.
[9] Bröcker, J. and Smith, L. A. (2007). Increasing the reliability of reliability diagrams. Weather and Forecasting 22 651-661.
[10] Buja, A., Stuetzle, W. and Shen, Y. (2005). Loss functions for binary class probability estimation and classification: Structure and applications. Statistics Department, Univ. Pennsylvania, Philadelphia, PA. Available at .
[11] Carter, C. K. and Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika 81 541-553. · Zbl 0809.62087 · doi:10.1093/biomet/81.3.541
[12] Chen, Y. (2008). Learning classifiers from imbalanced, only positive and unlabeled data sets. Project Report for UC San Diego Data Mining Contest. Dept. Computer Science, Iowa State Univ., Ames, IA. Available at .
[13] Clemen, R. T. and Winkler, R. L. (2007). Aggregating probability distributions. In Advances in Decision Analysis : From Foundations to Applications (W. Edwards, R. F. Miles and D. von Winterfeldt, eds.) 154-176. Cambridge Univ. Press, Cambridge.
[14] Cooke, R. M. (1991). Experts in Uncertainty : Opinion and Subjective Probability in Science . Clarendon Press, New York.
[15] Erev, I., Wallsten, T. S. and Budescu, D. V. (1994). Simultaneous over- and underconfidence: The role of error in judgment processes. Psychological Review 66 519-527.
[16] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2003). Bayesian data analysis . CRC press, Boca Raton. · Zbl 1279.62004
[17] Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2 1360-1383. · Zbl 1156.62017 · doi:10.1214/08-AOAS191
[18] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Institute of Electrical and Electronics Engineer ( IEEE ) Transactions on Pattern Analysis and Machine Intelligence 6 721-741. · Zbl 0573.62030 · doi:10.1109/TPAMI.1984.4767596
[19] Genest, C. and Zidek, J. V. (1986). Combining probability distributions: A critique and an annotated bibliography. Statist. Sci. 1 114-148. · Zbl 0587.62017 · doi:10.1214/ss/1177013825
[20] Gent, I. P. and Walsh, T. (1996). Phase transitions and annealed theories: Number partitioning as a case study. In Proceedings of European Conference on Artificial Intelligence ( ECAI 1996) 170-174. Wiley, New York.
[21] Gneiting, T. and Ranjan, R. (2013). Combining predictive distributions. Electron. J. Stat. 7 1747-1782. · Zbl 1294.62220 · doi:10.1214/13-EJS823
[22] Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L. and Johnson, N. A. (2008). Rejoinder on: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds [MR2434318]. TEST 17 256-264. · Zbl 1367.62204 · doi:10.1007/s11749-008-0122-x
[23] Good, I. J. (1952). Rational decisions. J. R. Stat. Soc. Ser. B Stat. Methodol. 14 107-114.
[24] Hastings, C. Jr., Mosteller, F., Tukey, J. W. and Winsor, C. P. (1947). Low moments for small samples: A comparative study of order statistics. Ann. Math. Statistics 18 413-426. · Zbl 0034.07401 · doi:10.1214/aoms/1177730388
[25] Hayes, B. (2002). The easiest hard problem. American Scientist 90 113-117.
[26] Karmarkar, N. and Karp, R. M. (1982). The differencing method of set partitioning. Technical Report UCB/CSD 82/113, Computer Science Division, Univ. California, Berkeley, CA.
[27] Kellerer, H., Pferschy, U. and Pisinger, D. (2004). Knapsack Problems . Springer, Dordrecht. · Zbl 1103.90003
[28] Lichtenstein, S., Fischhoff, B. and Phillips, L. D. (1977). Calibration of Probabilities: The State of the Art. In Decision Making and Change in Human Affairs (H. Jungermann and G. De Zeeuw, eds.) 275-324. Springer, Berlin.
[29] Lubinski, D. and Humphreys, L. G. (1996). Seeing the forest from the trees: When predicting the behavior or status of groups, correlate means. Psychology , Public Policy , and Law 2 363.
[30] Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., Scott, S. E., Moore, D., Atanasov, P. and Swift, S. A. et al. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological Science 25 . .
[31] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21 1087-1092.
[32] Migon, H. S., Gamerman, D., Lopes, H. F. and Ferreira, M. A. R. (2005). Dynamic models. In Bayesian Thinking : Modeling and Computation. Handbook of Statist. 25 553-588. Elsevier/North-Holland, Amsterdam. · doi:10.1016/S0169-7161(05)25019-8
[33] Mills, T. C. (1991). Time series techniques for economists . Cambridge Univ. Press, Cambridge. · Zbl 0746.90012
[34] Morgan, M. G. (1992). Uncertainty : A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis . Cambridge Univ. Press, Cambridge.
[35] Murphy, A. H. and Winkler, R. L. (1987). A general framework for forecast verification. Monthly Weather Review 115 1330-1338.
[36] Neal, R. M. (2003). Slice sampling. Ann. Statist. 31 705-767. · Zbl 1051.65007 · doi:10.1214/aos/1056562461
[37] Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series 28 . Oxford Univ. Press, Oxford. · Zbl 1039.62105
[38] Primo, C., Ferro, C. A., Jolliffe, I. T. and Stephenson, D. B. (2009). Calibration of probabilistic forecasts of binary events. Monthly Weather Review 137 1142-1149.
[39] Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M. (2005). Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review 133 1155-1174.
[40] Ranjan, R. and Gneiting, T. (2010). Combining probability forecasts. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 71-91. · doi:10.1111/j.1467-9868.2009.00726.x
[41] Sanders, F. (1963). On subjective probability forecasting. Journal of Applied Meteorology 2 191-201.
[42] Satopää, V. A., Baron, J., Foster, D. P., Mellers, B. A., Tetlock, P. E. and Ungar, L. H. (2014a). Combining multiple probability predictions using a simple logit model. International Journal of Forecasting 30 344-356.
[43] Satopää, V. A., Jensen, S. T., Mellers, B. A., Tetlock, P. E. and Ungar, L. H. (2014b). Supplement to “Probability aggregation in time-series: Dynamic hierarchical modeling of sparse expert beliefs.” . · Zbl 1429.62706
[44] Shlyakhter, A. I., Kammen, D. M., Broido, C. L. and Wilson, R. (1994). Quantifying the credibility of energy projections from trends in past data: The US energy sector. Energy Policy 22 119-130.
[45] Tanner, J., Wilson, P. and Swets, J. A. (1954). A decision-making theory of visual detection. Psychological Review 61 401-409.
[46] Tetlock, P. E. (2005). Expert Political Judgment : How Good Is It? How Can We Know? Princeton Univ. Press, Princeton, NJ.
[47] Ungar, L., Mellers, B., Satopää, V., Tetlock, P. and Baron, J. (2012). The good judgment project: A large scale test of different methods of combining expert predictions. In The Association for the Advancement of Artificial Intelligence 2012 Fall Symposium Series , Univ. Pennsylvania, Philadelphia, PA.
[48] Vislocky, R. L. and Fritsch, J. M. (1995). Improved model output statistics forecasts through model consensus. Bulletin of the American Meteorological Society 76 1157-1164.
[49] Wallace, B. C. and Dahabreh, I. J. (2012). Class probability estimates are unreliable for imbalanced data (and how to fix them). In Institute of Electrical and Electronics Engineers ( IEEE ) 12 th International Conference on Data Mining ( International Conference on Data Mining ) 695-704. IEEE Computer Society, Washington, DC.
[50] Wallsten, T. S., Budescu, D. V. and Erev, I. (1997). Evaluating and combining subjective probability estimates. Journal of Behavioral Decision Making 10 243-268.
[51] Wilson, A. G. (1994). Cognitive factors affecting subjective probability assessment. Discussion Paper 94-02, Institute of Statistics and Decision Sciences, Duke Univ., Chapel Hill, NC.
[52] Wilson, P. W., D’Agostino, R. B., Levy, D., Belanger, A. M., Silbershatz, H. and Kannel, W. B. (1998). Prediction of coronary heart disease using risk factor categories. Circulation 97 1837-1847.
[53] Winkler, R. L. and Jose, V. R. R. (2008). Comments on: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds [MR2434318]. TEST 17 251-255. · Zbl 1367.62207 · doi:10.1007/s11749-008-0121-y
[54] Winkler, R. L. and Murphy, A. H. (1968). “Good” probability assessors. Journal of Applied Meteorology 7 751-758.
[55] Wright, G., Rowe, G., Bolger, F. and Gammack, J. (1994). Coherence, calibration, and expertise in judgmental probability forecasting. Organizational Behavior and Human Decision Processes 57 1-25.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.