×

Modeling latent topics in social media using dynamic exploratory graph analysis: the case of the right-wing and left-wing trolls in the 2016 US elections. (English) Zbl 1486.62304

Summary: The past few years were marked by increased online offensive strategies perpetrated by state and non-state actors to promote their political agenda, sow discord, and question the legitimacy of democratic institutions in the US and Western Europe. In 2016, the US congress identified a list of Russian state-sponsored Twitter accounts that were used to try to divide voters on a wide range of issues. Previous research used latent Dirichlet allocation (LDA) to estimate latent topics in data extracted from these accounts. However, LDA has characteristics that may limit the effectiveness of its use on data from social media: The number of latent topics must be specified by the user, interpretability of the topics can be difficult to achieve, and it does not model short-term temporal dynamics. In the current paper, we propose a new method to estimate latent topics in texts from social media termed Dynamic Exploratory Graph Analysis (DynEGA). In a Monte Carlo simulation, we compared the ability of DynEGA and LDA to estimate the number of simulated latent topics. The results show that DynEGA is substantially more accurate than several different LDA algorithms when estimating the number of simulated topics. In an applied example, we performed DynEGA on a large dataset with Twitter posts from state-sponsored right- and left-wing trolls during the 2016 US presidential election. DynEGA revealed topics that were pertinent to several consequential events in the election cycle, demonstrating the coordinated effort of trolls capitalizing on current events in the USA. This example demonstrates the potential power of our approach for revealing temporally relevant information from qualitative text data.

MSC:

62P15 Applications of statistics to psychology
91F10 History, political science
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ananiadou, S.; McNaught, J., Text mining for biology and biomedicine (2006), Boston: Artech House Publishers, Boston
[2] Anderson, H., T. W. & Rubin. (1958). Statistical inference in factor analysis. In Proceedings of the 3rd berkeley symposium on mathematics, statistics, and probability (Vol. 5, pp. 111-150). · Zbl 0070.14703
[3] Arun, R., Suresh, V., Veni Madhavan, C. E., & Narasimha Murthy, M. N. (2010). On finding the natural number of topics with latent dirichlet allocation: Some observations. In R. B. Zaki M. J. Yu J. X. (Eds.), Advances in knowledge discovery and data mining. (Vol. 6118, pp. 391-402). Springer, Berlin. doi:10.1007/978-3-642-13657-3_43
[4] Baumert, A.; Schmitt, M.; Perugini, M.; Johnson, W.; Blum, G.; Borkenau, P.; Wrzus, C., Integrating personality structure, personality process, and personality development, European Journal of Personality, 31, 503-528 (2017) · doi:10.1002/per.2115
[5] Blei, DM; Ng, AY; Jordan, MI, Latent dirichlet allocation, Journal of Machine Learning Research, 3, Jan, 993-1022 (2003) · Zbl 1112.68379
[6] Bleidorn, W.; Hopwood, CJ, Using machine learning to advance personality assessment and theory, Personality and Social Psychology Review, 23, 190-203 (2019) · doi:10.1177/1088868318772990
[7] Boker, S. M. (2018). Longitudinal multivariate psychology. In E. Ferrer, S. M. Boker, & K. J. Grimm (Eds.) (pp. 126-141). Routledge.
[8] Boker, SM; Deboek, PR; Edler, C.; Keel, P.; Chow, SM; Ferrer, E.; Hsieh, F., Generalized local linear approximation of derivatives from time series, The notre dame series on quantitative methodology. Statistical methods for modeling human dynamics: An interdisciplinary dialogue, 161-178 (2010), UK: Routledge/Taylor & Francis Group, UK
[9] Boker, S. M., Tiberio, S. S., & Moulder, R. G. (2018). Robustness of time delay embedding to sampling interval misspecification. In Continuous time modeling in the behavioral and related sciences (pp. 239-258). Springer.
[10] Borsboom, D., Psychometric perspectives on diagnostic systems, Journal of Clinical Psychology, 64, 9, 1089-1108 (2008) · doi:10.1002/jclp.20503
[11] Borsboom, D.; Cramer, AO, Network analysis: an integrative approach to the structure of psychopathology, Annual Review of Clinical Psychology, 9, 91-121 (2013) · doi:10.1146/annurev-clinpsy-050212-185608
[12] Cao, J.; Xia, T.; Li, J.; Zhang, Y.; Tang, S., A density-based method for adaptive lda model selection, Neurocomputing, 72, 7-9, 1775-1781 (2009) · doi:10.1016/j.neucom.2008.06.011
[13] Cattell, R. B. (1965). Studies in psychology. In C. Banks & P. L. Broadhurst (Eds.) (pp. 223-266). University of London Press London.
[14] Chaney, A. J. B., & Blei, D. M. (2012). Visualizing topic models. In Proceedings of the sixth international aaai conference on weblogs and social media.
[15] Chen, J., & Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759-771. Retrieved from https://www.jstor.org/stable/20441500 · Zbl 1437.62415
[16] Christensen, AP, Towards a network psychometrics approach to assessment: simulations for redundancy, dimensionality, and loadings (Unpublished doctoral dissertation) (2020), Greensboro, NC, USA: University of North Carolina at Greensboro, Greensboro, NC, USA
[17] Christensen, A. P., & Golino, H. (2021). On the equivalency of factor and network loadings. Behavior research methods, pp. 1-18,. doi:10.3758/s13428-020-01500-6
[18] Christensen, A. P., Kenett, Y. N., Aste, T., Silvia, P. J., & Kwapil, T. R. (2018). Network structure of the wisconsin schizotypy scales-short forms: Examining psychometric network filtering approaches. Behavior Research Methods, 50(6), 2531-2550. doi:10.3758/s13428-018-1032-9
[19] Clark, AT; Ye, H.; Isbell, F.; Deyle, ER; Cowles, J.; Tilman, GD; Sugihara, G., Spatial convergent cross mapping to detect causal relationships from short time series, Ecology, 96, 5, 1174-1181 (2015) · doi:10.1890/14-1479.1
[20] Comrey, AL; Lee, HB, A first course in factor analysis (2016), New York: Routledge, New York
[21] Cramer, A.; Waldorp, LJ; Van Der Maas, HL; Borsboom, D., Comorbidity: a network perspective, Behavioral and Brain Sciences, 33, 2-3, 137-150 (2010) · doi:10.1017/S0140525X09991567
[22] Dalege, J.; Borsboom, D.; Harreveld, F.; Waldorp, LJ; Maas, HL, Network structure explains the impact of attitudes on voting decisions, Scientific Reports, 7, 1, 4909 (2017) · doi:10.1038/s41598-017-05048-y
[23] Deboeck, PR; Montpetit, MA; Bergeman, C.; Boker, SM, Using derivative estimates to describe intraindividual variability at multiple time scales, Psychological Methods, 14, 4, 367-386 (2009) · doi:10.1037/a0016622
[24] Deveaud, R.; SanJuan, E.; Bellot, P., Accurate and effective latent concept modeling for ad hoc information retrieval, Document Numérique, 17, 1, 61-84 (2014) · doi:10.3166/dn.17.1.61-84
[25] Engle, R.; Watson, M., A one-factor multivariate time series model of metropolitan wage rates, Journal of the American Statistical Association, 76, 376, 774-781 (1981) · doi:10.1080/01621459.1981.10477720
[26] Epskamp, MS; Irwing Paul, B., Network psychometrics, The wiley handbook of psychometric testing. A multidisciplinary reference on survey, scale and test development, 953-986 (2018), New York: Wiley, New York · doi:10.1002/9781118489772.ch30
[27] Epskamp, S.; Borsboom, D.; Fried, EI, Estimating psychological networks and their accuracy: a tutorial paper, Behavior Research Methods, 50, 1, 195-212 (2018) · doi:10.3758/s13428-017-0862-1
[28] Epskamp, S.; Fried, E., A tutorial on regularized partial correlation networks, Psychological Methods, 23, 4, 617-634 (2018) · doi:10.1037/met0000167
[29] Epskamp, S.; Rhemtulla, M.; Borsboom, D., Generalized network pschometrics: combining network and latent variable models, Psychometrika, 82, 4, 904-927 (2017) · Zbl 1402.62307 · doi:10.1007/s11336-017-9557-x
[30] Epskamp, S.; Waldorp, LJ; Mõttus, R.; Borsboom, D., The gaussian graphical model in cross-sectional and time-series data, Multivariate Behavioral Research, 53, 4, 453-480 (2018) · doi:10.1080/00273171.2018.1454823
[31] Feinerer, I., Hornik, K., & Meyer, D. (2008). Text mining infrastructure in r. Journal of Statistical Software, 25(5), 1-54. Retrieved from http://www.jstatsoft.org/v25/i05/
[32] Fenton, N. (2016). The internet of radical politics and social change. In Misunderstanding the internet (pp. 173-202). Routledge.
[33] Foygel, R., & Drton, M. (2010). Extended bayesian information criteria for gaussian graphical models. In Proceedings of the 23rd international conference on neural information processing systems - volume 1 (Vol. 1, pp. 604-612). Vancouver, Canada.
[34] Fried, E.; van Borkulo, CD; Cramer, AOJ; Boschloo, L.; Schoevers, RA; Borsboom, D., Mental disorders as networks of problems: a review of recent insights, Social Psychiatry and Psychiatric Epidemiology, 52, 1, 1-10 (2017) · doi:10.1007/s00127-016-1319-z
[35] Friedman, J.; Hastie, T.; Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 3, 432-441 (2008) · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[36] Garrido, L. E., Abad, F. J., & Ponsoda, V. (2013). A new look at horn’s parallel analysis with ordinal variables. Psychological Methods,18(4), 454-74. doi:10.1037/a0030005
[37] Gates, KM; Henry, T.; Steinley, D.; Fair, DA, A monte carlo evaluation of weighted community detection algorithms, Frontiers in Neuroinformatics, 10, 45 (2016) · doi:10.3389/fninf.2016.00045
[38] Ghanem, B., Buscaldi, D., & Rosso, P. (2019). TexTrolls: Identifying russian trolls on twitter from a textual perspective. arXiv, (1910.01340). Retrieved from arXiv:1910.01340
[39] Golino, H., & Christensen, A. P. (2019). EGAnet: Exploratory graph analysis: A framework for estimating the number of dimensions in multivariate data using network psychometrics. Retrieved from https://CRAN.R-project.org/package=EGAnet
[40] Golino, HF; Demetriou, A., Estimating the dimensionality of intelligence like data using exploratory graph analysis, Intelligence, 62, 54-70 (2017) · doi:10.1016/j.intell.2017.02.007
[41] Golino, HF; Epskamp, S., Exploratory graph analysis: a new approach for estimating the number of dimensions in psychological research, PloS One, 12, 6, e0174035 (2017) · doi:10.1371/journal.pone.0174035
[42] Golino, H., Moulder, R., Shi, D., Christensen, A., Garrido, L., Neto, M., Boker, & S. (2020a). Entropy fit indices: New fit measures for assessing the structure and dimensionality of multiple latent variables. Multivariate Behavioral Research. doi:10.1080/00273171.2020.1779642
[43] Golino, H., Shi, D., Garrido, L. E., Christensen, A. P., Nieto, M. D., Sadana, R., & Martinez-Molina, A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: a simulation and tutorial. Psychological Methods,25(3), 292-230. doi:10.1037/met0000255
[44] Grover, P.; Kar, AK; Dwivedi, YK; Janssen, M., Polarization and acculturation in us election 2016 outcomes-can twitter analytics predict changes in voting preferences, Technological Forecasting and Social Change, 145, 438-460 (2019) · doi:10.1016/j.techfore.2018.09.009
[45] Guttman, L., Image theory for the structure of quantitative variates, Psychometrika, 18, 4, 277-296 (1953) · Zbl 0053.27703 · doi:10.1007/BF02289264
[46] Hallquist, M. N., Wright, A. G., & Molenaar, P. C. (2019). Problems with centrality measures in psychopathology symptom networks: why network psychometrics cannot escape psychometric theory. Multivariate Behavioral Research, pp. 1-25,. doi:10.1080/00273171.2019.1640103
[47] Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, K.; Martinez-Hernandez, V.; Perez-Meana, H.; Olivares-Mercado, J.; Sanchez, V., Social sentiment sensor in twitter for predicting cyber-attacks using l1 regularization, Sensors, 18, 5, 1380 (2018) · doi:10.3390/s18051380
[48] Horibe, Y., Entropy and correlation, IEEE Transactions on Systems, Man, and Cybernetics, 5, 641-642 (1985) · Zbl 0585.62008 · doi:10.1109/TSMC.1985.6313441
[49] Hornik, K.; Grün, B., Topicmodels: an r package for fitting topic models, Journal of Statistical Software, 40, 13, 1-30 (2011)
[50] Hou-Liu, J. (2018). Benchmarking and improving recovery of number of topics in latent dirichlet allocation models. viXra. Retrieved from https://vixra.org/abs/1801.0045
[51] Kjellström, S.; Golino, H., Mining concepts of health responsibility using text mining and exploratory graph analysis, Scandinavian Journal of Occupational Therapy, 26, 6, 395-410 (2019) · doi:10.1080/11038128.2018.1455896
[52] Lauritzen, SL, Graphical models (1996), Oxford: Clarendon Press, Oxford · Zbl 0907.62001
[53] Libicki, M. C. (1995). What is information warfare? The Center for Advanced Command Concepts; Technology, National Defense University. Retrieved from https://apps.dtic.mil/dtic/tr/fulltext/u2/a367662.pdf
[54] Linvill, D. L., Boatwright, B. C., Grant, W. J., & Warren, P. L. (2019). “THE russians are hacking my brain!” Investigating russia’s internet research agency twitter tactics during the 2016 united states presidential campaign. Computers in Human Behavior,99, 292-300.
[55] Linvill, D. L., & Warren, P. L. (2018). Troll factories: The internet research agency and state-sponsored agenda building. Clemson University. Retrieved from https://pwarren.people.clemson.edu/Linvill_Warren_TrollFactory.pdf
[56] Llewellyn, C., Cram, L., Favero, A., & Hill, R. L. (2018). Russian troll hunting in a brexit twitter archive. In Proceedings of the 18th acm/ieee on joint conference on digital libraries (pp. 361-362).
[57] Massara, GP; Di Matteo, T.; Aste, T., Network filtering for big data: triangulated maximally filtered graph, Journal of Complex Networks, 5, 2, 161-178 (2016) · doi:10.1093/comnet/cnw015
[58] Nesselroade, J. R., McArdle, J. J., Aggen, S. H., & Meyers, J. M. (2002). Dynamic factor analysis models for representing process in multivariate time-series. In D. S. Moskowitz & S. L. Hershberger (Eds.), Multivariate applications book series. Modeling intraindividual variability with repeated measures data: Methods and applications (pp. 235-265). Lawrence Erlbaum Associates Publishers.
[59] Nikita, M. (2016). Ldatuning: Tuning of the latent dirichlet allocation models parameters (R package version 1.0.0). https://CRAN.
[60] Nikita, M. (2019). Ldatuning: Tuning of the latent dirichlet allocation models parameters. Retrieved from https://CRAN.R-project.org/package=ldatuning
[61] Phan, X.-H., Nguyen, L.-M., & Horiguchi, S. (2008). Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th international conference on world wide web (pp. 91-100).
[62] Pons, P., & Latapy, M. (2005). Computing communities in large networks using random walks. In P. Yolum, T. Güngör, F. Gürgen, & C. Özturan (Eds.), Computer and information sciences - iscis 2005 (pp. 284-293). Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/11569596_31 · Zbl 1161.68694
[63] Porter, MF, An algorithm for suffix stripping, Program, 14, 3, 130-137 (1980) · doi:10.1108/eb046814
[64] Rajadesingan, A., & Liu, H. (2014). Identifying users with opposing opinions in twitter debates. In International conference on social computing, behavioral-cultural modeling, and prediction (pp. 153-160). Springer.
[65] R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
[66] Roeder, O. (2018). Why we’re sharing 3 million russian troll tweets. FiveThirtyEight, Retrieved from https://fivethirtyeight.com/features/why-were-sharing-3-million-russian-troll-tweets/. Retrieved from https://fivethirtyeight.com/features/why-were-sharing-3-million-russian-troll-tweets/
[67] Rosenstein, MT; Collins, JJ; De Luca, CJ, A practical method for calculating largest lyapunov exponents from small data sets, Physica D: Nonlinear Phenomena, 65, 1-2, 117-134 (1993) · Zbl 0779.58030 · doi:10.1016/0167-2789(93)90009-P
[68] Singh, N.; Hu, C.; Roehl, WS, Text mining a decade of progress in hospitality human resource management research: identifying emerging thematic development, International Journal of Hospitality Management, 26, 1, 131-147 (2007) · doi:10.1016/j.ijhm.2005.10.002
[69] Stewart, L. G., Arif, A., & Starbird, K. (2018). Examining trolls and polarization with a retweet network. In Proc: ACM wsdm, workshop on misinformation and misbehavior mining on the web.
[70] Sugihara, G.; May, R.; Ye, H.; Hsieh, C-H; Deyle, E.; Fogarty, M.; Munch, S., Detecting causality in complex ecosystems, Science, 338, 6106, 496-500 (2012) · Zbl 1355.92144 · doi:10.1126/science.1227079
[71] Szafranski, R. (1995). A theory of information warfare: Preparing for 2020. Air University Maxwell Airforce Base. Retrieved from https://apps.dtic.mil/dtic/tr/fulltext/u2/a328193.pdf
[72] Taddeo, M., Cyber conflicts and political power in information societies, Minds and Machines, 27, 2, 265-268 (2017) · doi:10.1007/s11023-017-9436-3
[73] Takens, F. (1981). Detecting strange attractors in turbulence. In Lecture notes in mathematics (vol. 898, pp. 366-381). Springer. doi:10.1007/BFb0091924 · Zbl 0513.58032
[74] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 58, 1, 267-288 (1996) · Zbl 0850.62538 · doi:10.1111/j.2517-6161.1996.tb02080.x
[75] van Bork, R., van Borkulo, C. D., Waldorp, L. J., Cramer, A. O., & Borsboom, D. (2018). Network models for clinical psychology. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience,5, 1-35.
[76] Van Der Maas, HL; Dolan, CV; Grasman, RP; Wicherts, JM; Huizenga, HM; Raijmakers, ME, A dynamical model of general intelligence: the positive manifold of intelligence by mutualism, Psychological Review, 113, 4, 842-861 (2006) · doi:10.1037/0033-295X.113.4.842
[77] Van Der Maas, HL; Kan, K-J; Marsman, M.; Stevenson, CE, Network models for cognitive development and intelligence, Journal of Intelligence, 5, 2, 16 (2017) · doi:10.3390/jintelligence5020016
[78] Velicer, WF, Determining the number of components from the matrix of partial correlations, Psychometrika, 41, 3, 321-327 (1976) · Zbl 0336.62041 · doi:10.1007/BF02293557
[79] Whitney, H., Differentiable manifolds, The Annals of Mathematics, 37, 3, 645-680 (1936) · JFM 62.1454.01 · doi:10.2307/1968482
[80] Widaman, KF, Common factor analysis versus principal component analysis: differential bias in representing model parameters?, Multivariate Behavioral Research, 28, 3, 263-311 (1993) · doi:10.1207/s15327906mbr2803_1
[81] Williams, D. R., & Rast, P. (2019). Back to the basics: Rethinking partial correlation network methodology. British Journal of Mathematical and Statistical Psychology, 1-25. doi:10.1111/bmsp.12173
[82] Yardi, S.; Boyd, D., Dynamic debates: an analysis of group polarization over time on twitter, Bulletin of Science, Technology & Society, 30, 5, 316-327 (2010) · doi:10.1177/0270467610380011
[83] Zannettou, S., Caulfield, T., De Cristofaro, E., Sirivianos, M., Stringhini, G., & Blackburn, J. (2019). Disinformation warfare: Understanding state-sponsored trolls on twitter and their influence on the web. In Companion proceedings of the 2019 world wide web conference (pp. 218-226).
[84] Zannettou, S., Caulfield, T., Setzer, W., Sirivianos, M., Stringhini, G., & Blackburn, J. (2019). Who let the trolls out? Towards understanding state-sponsored trolls. In Proceedings of the 10th acm conference on web science (pp. 353-362).
[85] Zhang, Z.; Hamaker, EL; Nesselroade, JR, Comparisons of four methods for estimating a dynamic factor model, Structural Equation Modeling: A Multidisciplinary Journal, 15, 3, 377-402 (2008) · doi:10.1080/10705510802154281
[86] Ziegler, CE, International dimensions of electoral processes: Russia, the usa, and the 2016 elections, International Politics, 55, 5, 557-574 (2018) · doi:10.1057/s41311-017-0113-1
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.