×

Marked self-exciting point process modelling of information diffusion on twitter. (English) Zbl 1411.62357

Summary: Information diffusion occurs on microblogging platforms like Twitter as retweet cascades. When a tweet is posted, it may be retweeted and henceforth further retweeted, and the retweeting process continues iteratively and indefinitely. A natural measure of the popularity of a tweet is the number of retweets it generates. Accurate predictions of tweet popularity can assist Twitter to rank contents more effectively and facilitate the assessment of potential for marketing and campaigning strategies. In this paper, we propose a model called the Marked Self-Exciting Process with Time-Dependent Excitation Function, or MaSEPTiDE for short, to model the retweeting dynamics and to predict the tweet popularity. Our model does not require expensive feature engineering but is capable of leveraging the observed dynamics to accurately predict the future evolution of retweet cascades. We apply our proposed methodology on a large amount of Twitter data and report substantial improvement in prediction performance over existing approaches in the literature.

MSC:

62P25 Applications of statistics to social sciences
60G55 Point processes (e.g., Poisson, Cox, Hawkes processes)
62M20 Inference from stochastic processes and prediction

Software:

R
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Agarwal, D., Chen, B.-C. and Elango, P. (2009). Spatio-temporal models for estimating click-through rate. In Proceedings of the 18th International Conference on World Wide Web 21-30. ACM, New York.
[2] Ahmed, M., Spagna, S., Huici, F. and Niccolini, S. (2013). A peek into the future: Predicting the evolution of popularity in user generated content. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining 607-616. ACM, New York.
[3] Alves, R. A., Assunção, R. and de Melo, P. O. (2016). Burstiness scale: A highly parsimonious model for characterizing random series of events. Preprint. Availble at arXiv:1602.06431.
[4] Bakshy, E., Hofman, J. M., Mason, W. A. and Watts, D. J. (2011). Everyone’s an influencer: Quantifying influence on Twitter. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining 65-74. ACM, New York.
[5] Barabasi, A.-L. (2005). The origin of bursts and heavy tails in human dynamics. Nature435 207-211.
[6] Cha, M., Haddadi, H., Benevenuto, F. and Gummadi, P. K. (2010). Measuring user influence in Twitter: The million follower fallacy. In Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM-2010) 10-17. AAAI Press, Palo Alto, CA.
[7] Chen, F. and Hall, P. (2013). Inference for a nonstationary self-exciting point process with an application in ultra-high frequency financial data modeling. J. Appl. Probab.50 1006-1024. · Zbl 1411.60074 · doi:10.1239/jap/1389370096
[8] Chen, F. and Hall, P. (2016). Nonparametric estimation for self-exciting point processes—A parsimonious approach. J. Comput. Graph. Statist.25 209-224.
[9] Chen, F. and Stindl, T. (2018). Direct likelihood evaluation for the renewal Hawkes process. J. Comput. Graph. Statist.27 119-131. · Zbl 1469.62146 · doi:10.1016/j.csda.2018.01.021
[10] Crane, R. and Sornette, D. (2008). Robust dynamic classes revealed by measuring the response function of a social system. Proc. Natl. Acad. Sci. USA105 15649-15653.
[11] Daley, D. J. and Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes, Vol. I: Elementary Theory and Methods, 2nd ed. Springer, New York. · Zbl 1026.60061
[12] Fox, E. W., Short, M. B., Schoenberg, F. P., Coronges, K. D. and Bertozzi, A. L. (2016). Modeling e-mail networks and inferring leadership using self-exciting point processes. J. Amer. Statist. Assoc.111 564-584.
[13] Gao, S., Ma, J. and Chen, Z. (2015). Modeling and predicting retweeting dynamics on microblogging platforms. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining 107-116. ACM, New York.
[14] Gneiting, T. (2011). Making and evaluating point forecasts. J. Amer. Statist. Assoc.106 746-762. · Zbl 1232.62028 · doi:10.1198/jasa.2011.r10138
[15] Hawkes, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika58 83-90. · Zbl 0219.60029 · doi:10.1093/biomet/58.1.83
[16] Kobayashi, R. and Lambiotte, R. (2016). TiDeH: Time-dependent Hawkes process for predicting retweet dynamics. In Proceedings of the Tenth International AAAI Conference on Web and Social Media (ICWSM-2016) 191-200. The AAAI Press, Palo Alto, CA.
[17] Kwak, H., Lee, C., Park, H. and Moon, S. (2010). What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web 591-600. ACM, New York.
[18] Lewis, P. A. W. and Shedler, G. S. (1979). Simulation of nonhomogeneous Poisson processes by thinning. Nav. Res. Logist. Q.26 403-413. · Zbl 0497.60003 · doi:10.1002/nav.3800260304
[19] Li, C.-T., Shan, M.-K., Jheng, S.-H. and Chou, K.-C. (2016). Exploiting concept drift to predict popularity of social multimedia in microblogs. Inform. Sci.339 310-331.
[20] Lymperopoulos, I. N. (2016). Predicting the popularity growth of online content: Model and algorithm. Inform. Sci.369 585-613.
[21] Matsubara, Y., Sakurai, Y., Prakash, B. A., Li, L. and Faloutsos, C. (2012). Rise and fall patterns of information diffusion: Model and implications. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 6-14. ACM, New York.
[22] Mishra, S., Rizoiu, M.-A. and Xie, L. (2016). Feature driven and point process approaches for popularity prediction. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management 1069-1078. ACM, New York.
[23] Naveed, N., Gottron, T., Kunegis, J. and Alhadi, A. C. (2011). Bad news travel fast: A content-based analysis of interestingness on Twitter. In Proceedings of the 3rd International Web Science Conference Art. ID 8. ACM, New York.
[24] Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. Comput. J.7 308-313. · Zbl 0229.65053 · doi:10.1093/comjnl/7.4.308
[25] Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. J. Amer. Statist. Assoc.83 9-27.
[26] R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria.
[27] Szabo, G. and Huberman, B. A. (2010). Predicting the popularity of online content. Commun. ACM53 80-88.
[28] Tumasjan, A., Sprenger, T. O., Sandner, P. G. and Welpe, I. M. (2010). Predicting elections with Twitter: What 140 characters reveal about political sentiment. In Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM-2010) 178-185. AAAI Press, Palo Alto, CA.
[29] Wu, B., Cheng, W.-H., Zhang, Y. and Mei, T. (2016). Time matters: Multi-scale temporalization of social media popularity. In Proceedings of the 2016 ACM on Multimedia Conference 1336-1344. ACM, New York.
[30] Yan, Y., Tan, Z., Gao, X., Tang, S. and Chen, G. (2016). STH-Bass: A spatial-temporal heterogeneous bass model to predict single-tweet popularity. In International Conference on Database Systems for Advanced Applications 18-32. Springer, Cham.
[31] Zaman, T., Fox, E. B. and Bradlow, E. T. (2014). A Bayesian approach for predicting the popularity of tweets. Ann. Appl. Stat.8 1583-1611. · Zbl 1304.62147 · doi:10.1214/14-AOAS741
[32] Zhao, Q., Erdogdu, M. A., He, H. Y., Rajaraman, A. and Leskovec, J. (2015). SEISMIC: A self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1513-1522. ACM, New York.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.