A Bayesian approach for predicting the popularity of tweets. (English) Zbl 1304.62147

Summary: We predict the popularity of short messages called tweets created in the micro-blogging site known as Twitter. We measure the popularity of a tweet by the time-series path of its retweets, which is when people forward the tweet to others. We develop a probabilistic model for the evolution of the retweets using a Bayesian approach, and form predictions using only observations on the retweet times and the local network or “graph” structure of the retweeters. We obtain good step ahead forecasts and predictions of the final total number of retweets even when only a small fraction (i.e., less than one tenth) of the retweet path is observed. This translates to good predictions within a few minutes of a tweet being posted, and has potential implications for understanding the spread of broader ideas, memes or trends in social networks.


62P25 Applications of statistics to social sciences
91D30 Social networks; opinion dynamics
91B84 Economic time series analysis
62A09 Graphical methods in statistics
62F15 Bayesian inference
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62M20 Inference from stochastic processes and prediction
Full Text: DOI arXiv Euclid


[1] Agarwal, D., Chen, B. and Elango, P. (2009). Spatial-temporal models for estimating click-through rates. Unpublished manuscript.
[2] Bakshy, E., Hofman, J. M., Mason, W. A. and Watts, D. J. (2010). Everyone’s an influencer: Quantifying influence on Twitter. In Proc. WSDM . ACM, New York.
[3] Bandari, R., Asur, S. and Huberman, B. A. (2012). The pulse of news in social media: Forecasting popularity. In AAAI Conference on Weblogs and Social Media . AAAI, Dublin, Ireland.
[4] Brown, L., Gans, N., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn, S. and Zhao, L. (2005). Statistical analysis of a telephone call center: A queueing-science perspective. J. Amer. Statist. Assoc. 100 36-50. · Zbl 1117.62303
[5] Cha, M., Haddadi, H., Benevenuto, F. and Gummadi, K. P. (2010). Measuring user influence in Twitter: The million follower fallacy. In Proc. AAAI Conf. on Weblogs and Social Media . AAAI, Washington, DC.
[6] Gelman, A. and Hill, H. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models . Cambridge Univ. Press, Cambridge.
[7] Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457-472. · Zbl 1386.65060
[8] Goel, S., Watts, D. J. and Goldstein, D. G. (2012). The structure of online diffusion networks. In Proc. EC . ACM, New York.
[9] Hong, L., Dan, O. and Davison, B. D. (2011). Predicting popular messages in Twitter. In Proceedings of the 20 th International Conference Companion on World Wide Web 57-58. ACM, New York.
[10] Kwak, H., Lee, C., Park, H. and Moon, S. (2010). What is Twitter, a social network or a news media? In Proc. WWW . ACM, New York.
[11] Naveed, N., Gottron, T., Kunegis, J. and Alhadi, A. C. (2011). Bad news travels fast: A content-based analysis of interestingness on Twitter. In ACM Web Science . ACM, New York.
[12] Petrovic, S., Osborne, M. and Lavrenko, V. (2011). RT to win! Prediction message popularity in Twitter. In AAAI Conference on Weblogs and Social Media . AAAI, Barcelona. Spain.
[13] Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583-639. · Zbl 1067.62010
[14] Stouffer, D. B., Malmgren, R. D. and Amaral, L. A. N. (2006). Log-normal statistics in e-mail communication patterns. Available at .
[15] Suh, B., Hong, L., Pirolli, P. and Chi, E. H. (2010). Want to be rewteeted? Large scale analysis on factors impacting retweet in Twitter network. In IEEE International Conference on Social Computing 177-184. IEEE, Minneapolis, MN.
[16] Szabo, G. and Huberman, B. A. (2010). Predicting the popularity of online content. Commun. ACM 8 80-88.
[17] Twitter (2012). Using the Twitter search API. Available at .
[18] Ulrich, R. and Miller, J. (1993). Information processing models generating lognormally distributed reaction times. J. Math. Psych. 37 513-525. · Zbl 0791.92030
[19] US Securities and Exchange Commission (2013). Twitter, Inc. Form S-1. Available at .
[20] van Breukelen, G. J. P. (1995). Theoretical note: Parallel information processing models compatible with lognormally distributed response times. J. Math. Psych. 39 396-399. · Zbl 0844.92031
[21] Vu, D. Q., Asuncion, A. U., Hunter, D. R. and Smyth, P. (2011). Dynamic egocentric models for citation networks. In International Conference on Machine Learning . ACM, New York.
[22] Zaman, T., Fox, E. B. and Bradlow, E. T. (2014). Supplement to “A Bayesian approach for predicting the popularity of tweetss.” . · Zbl 1304.62147
[23] Zaman, T., Herbrich, R., Gael, J. V. and Stern, D. (2010). Predicting information spreading in Twitter. In Proc. Workshop on Computational Social Science and the Wisdom of Crowds , NIPS . NIPS, Vancouver, Canada.
[24] Zhou, Z., Bandari, R., Kong, J., Qian, H. and Roychowdhury, V. (2010). Information resonance on Twitter: Watching Iran. In ACM Workshop on Social Media Analytics 123-131. ACM, New York.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.