Predicting social response to infectious disease outbreaks from Internet-based news streams. (English) Zbl 1392.92099

Summary: Infectious disease outbreaks often have consequences beyond human health, including concern among the population, economic instability, and sometimes violence. A warning system capable of anticipating social disruptions resulting from disease outbreaks is urgently needed to help decision makers prepare appropriately. We designed a system that operates in near real-time to identify and predict social response. Over 150,000 Internet-based news articles related to outbreaks of 16 diseases in 72 countries and territories were provided by HealthMap. These articles were automatically tagged with indicators of the disease activity and population reaction. An anomaly detection algorithm was implemented on the population reaction indicators to identify periods of unusually severe social response. Then a model was developed to predict the probability of these periods of unusually severe social response occurring in the coming week, 2 and 3 weeks. This model exhibited remarkably strong performance for diseases with substantial media coverage. For country-disease pairs with a median of 20 or more articles per year, the onset of social response in the next week was correctly predicted over 60% of the time, and 87% of weeks were correctly predicted. Performance was weaker for diseases with little media coverage, and, for these diseases, the main utility of our system is in identifying social response when it occurs, rather than predicting when it will happen in the future. Overall, the developed near real-time prediction approach is a promising step toward developing predictive models to inform responders of the likely social consequences of disease spread.


92D30 Epidemiology
91D30 Social networks; opinion dynamics


SMOTE; BioCaster
Full Text: DOI Link


[1] Batista, GEA; Prati, RC; Monard, MC, A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 6, 20-29, (2004)
[2] Beck, N., Epstein, D., Jackman, S., & O’Halloran, S. (2001). Alternative models of dynamics in binary time-series-cross-section models: The example of state failure. http://hdl.handle.net/10022/AC:P:9718.
[3] Beck, N; Katz, JN; Tucker, R, Taking time seriously: time-series-cross-section analysis with a binary dependent variable, American Journal of Political Science, 42, 1260-1288, (1998)
[4] Bollen, J; Mao, H; Zeng, X, Twitter mood predicts the stock market, Journal of Computational Science, 2, 1-8, (2011)
[5] Breiman, L, Random forests, Machine Learning, 45, 5-32, (2001) · Zbl 1007.68152
[6] Brownstein, JS; Freifeld, CC; Reis, BY; Mandl, KD, Surveillance sans frontières: Internet-based emerging infectious disease intelligence and the healthmap project, PLoS Med, 5, e151, (2008)
[7] Buckeridge, DL; Burkom, H; Campbell, M; Hogan, WR; Moore, AW, Algorithms for rapid outbreak detection: A research synthesis, Journal of Biomedical Informatics, 38, 99-113, (2005)
[8] Chawla, NV; Bowyer, KW; Hall, LO; Kegelmeyer, WP, SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357, (2002) · Zbl 0994.68128
[9] Cheng, C, To be paranoid is the standard? panic responses to SARS outbreak in the Hong Kong special administrative region, Asian Perspective, 28, 67-98, (2004)
[10] Collier, N; Doan, S; Kawazoe, A; Goodwin, RM; Conway, M; Tateno, Y; Ngo, QH; Dien, D; Kawtrakul, A; Takeuchi, K; Shigematsu, M; Taniguchi, K, Biocaster: detecting public health rumors with a web-based text mining system, Bioinformatics, 24, 2940-2941, (2008)
[11] D’Orazio, V; Yonamine, JE, Kickoff to conflict: A sequence analysis of intra-state conflict-preceding event structures, PLoS ONE, 10, e0122,472, (2015)
[12] Doyle, A; Katz, G; Summers, K; Ackermann, C; Zavorin, I; Lim, Z; Muthiah, S; Butler, P; Self, N; Zhao, L; Lu, CT; Khandpur, RP; Fayed, Y; Ramakrishnan, N, Forecasting significant societal events using the EMBERS streaming predictive analytics system, Big Data, 2, 185-195, (2014)
[13] Fast, S. M., González, M. C., Wilson, J. M., & Markuzon, N. (2015). Modelling the propagation of social response during a disease outbreak. Journal of The Royal Society Interface, 12(104), 20141105. doi:10.1098/rsif.2014.1105. · Zbl 1288.62201
[14] Gayo-Avello, D, A meta-analysis of state-of-the-art electoral prediction from twitter data, Social Science Computer Review, 31, 649-679, (2013)
[15] Gerber, MS, Predicting crime using twitter and kernel density estimation, Decision Support Systems, 61, 115-125, (2014)
[16] International Federation of Red Cross and Red Crescent Societies (2015) Red Cross Red Crescent denounces countinued violence against volunteers working to stop the spread of Ebola. http://www.ifrc.org/en/news-and-media/press-releases/africa/guinea/red-cross-denounces-continued-violence-against-volunteers-working-to-stop-the-spread-of-ebola
[17] Jackman, S. (2000). In and out of war and peace: Transitional models of international conflict. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=
[18] Kinsman, J, “A time of fear”: local, national, and international responses to a large ebola outbreak in uganda, Globalization and Health, 8, 15-15, (2012)
[19] Lau, J. T. F., Griffiths, S., Choi, K. C., & Tsui, H. Y. (2010). Avoidance behaviors and negative psychological responses in the general population in the initial stage of the H1N1 pandemic in Hong Kong. BMC Infectious Diseases, 10(1), 139. doi:10.1186/1471-2334-10-139.
[20] Lozano, R; Naghavi, M; Foreman, K; Lim, S; Shibuya, K; Aboyans, V; etal., Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: A systematic analysis for the global burden of disease study 2010, Lancet, 380, 2095-2128, (2012)
[21] Mascaro, S; Nicholso, AE; Korb, KB, Anomaly detection in vessel tracks using Bayesian networks, International Journal of Approximate Reasoning, 55, 84-98, (2014)
[22] McGrath, JW, Biological impact of social disruption resulting from epidemic disease, American Journal of Physical Anthropology, 84, 407-419, (1991)
[23] Montgomery, D. C. (2009). Introduction to Statistical Quality Control (6th ed.). New Jersey: Wiley.
[24] Montgomery, JM; Hollenbach, FM; Ward, MD, Improving predictions using ensemble Bayesian model averaging, Political Analysis, 20, 271-291, (2012)
[25] Mykhalovskiy, E; Weir, L, The global public health intelligence network and early warning outbreak detection: A Canadian contribution to global public health, Canadian Journal of Public Health/Revue Canadienne de SantéPublique, 97, 42-44, (2006)
[26] O’Brien, SP, Crisis early warning and decision support: contemporary approaches and thoughts on future research, International Studies Review, 12, 87-104, (2010)
[27] Rabiner, LR; Juang, BH, An introduction to hidden Markov models, IEEE ASSP Magazine, 3, 4-16, (1986)
[28] Racette, MP; Smith, CT; Cunningham, MP; Heekin, TA; Lemley, JP; Mathieu, RS, Improving situational awareness for Humanitarian logistics through predictive modeling, Systems and Information Engineering Design Symposium (SIEDS), 2014, 334-339, (2014)
[29] Rashidi, L; Hashemi, S; Hamzeh, A, Anomaly detection in categorical datasets using Bayesian networks, Artificial Intelligence and Computational Intelligence, 7003, 610-619, (2011)
[30] Roberts, SW, Control chart tests based on geometric moving averages, Technometrics, 1, 239-250, (1959)
[31] Schumaker, RP; Chen, H, Textual analysis of stock market prediction using breaking financial news: the azfin text system, ACM Transactions on Information Systems (TOIS), 27, 1-19, (2009)
[32] Servi, L, Analyzing social media data having discontinuous underlying dynamics, Operations Research Letters, 41, 581-585, (2013) · Zbl 1288.62201
[33] Sherlaw, W; Raude, J, Why the French did not choose to panic: A dynamic analysis of the public response to the influenza pandemic, Sociology of Health & Illness, 35, 332-344, (2013)
[34] Truvé, S. (2013). Big data for the future: Unlocking the predictive power of the web. http://www.slideshare.net/RecordedFuture/big-data-for-the-future-unlocking-the-predictive-power-of-the-web
[35] Vaisman, E., Fast, S. M., Cunha, M. G., Postlethwaite, T., Wilson, J. M., & Mekaru, S. R. (2014). Predicting negative social response to disease outbreaks using biosurveillance and news data. In: 2014 INFORMS Workshop on Data Mining and Analytics.
[36] Wilson, DL, Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man and Cybernetics, 2, 408-421, (1972) · Zbl 0276.62060
[37] Wilson, K; Brownstein, JS, Early detection of disease outbreaks using the Internet, Canadian Medical Association Journal, 180, 829-831, (2009)
[38] Wong, W.K., Moore, A., Cooper, G., & Wagner, M. (2003). Bayesian network anomaly pattern detection for disease outbreaks. In Proceedings of the Twentieth International Conference on Machine Learning (pp. 808-815).
[39] Woodall, JP, Global surveillance of emerging diseases: the promed-mail perspective, Cad Saude Publica, 17, 147-154, (2001)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.