×

A polarity analysis framework for Twitter messages. (English) Zbl 1410.91397

Summary: Social media, such as Twitter and Facebook, allow the creation, sharing and exchange of information among people, companies and brands. This information can be used for several purposes, such as to understand consumers and their preferences. In this direction, the sentiment analysis can be used as a feedback mechanism. This analysis corresponds to classifying a text according to the sentiment that the writer intended to transmit. A basic sentiment classifier determines the sentiment polarity (negative, neutral or positive) of a given text at the document, sentence, or feature/aspect level. Advanced types may consider other elements like the emotional state (e.g. angry, sad, happy), affective states (e.g. pleasure and pain), motivational states (e.g. hunger and curiosity), temperaments, among others. In general, there are two main approaches to attribute sentiment to tweets: based on knowledge; or based on machine learning algorithms. In the latter case, the learning algorithm requires a pre-classified data sample to determine the class of new data. Typically, the sample is pre-classified manually, making the process time consuming and reducing its real time applicability for big data. This paper proposes a polarity analysis framework for Twitter messages, which combines both approaches and an automatic contextual module. To assess the performance of the proposed framework, four text datasets from the literature are used. Five different types of classifiers were considered: Naïve Bayes (NB); Support Vector Machines (SVM); Decision Trees (J48); and Nearest Neighbors (KNN). The results show that the proposal is a suitable framework to automate the whole polarity analysis process, providing high accuracy levels and low false positive rates.

MSC:

91D30 Social networks; opinion dynamics
68M11 Internet topics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Perc, M., The Matthew effect in empirical data, J. R. Soc. Interface, 11, 1-15 (2014)
[2] Gao, J.; Hu, J.; Mao, X.; Perc, M., Culturomics meets random fractal theory: insights into long-range correlations of social and, J. R. Soc. Interface, 9, 1956-1964 (2012)
[3] Perc, M., Evolution of the most common English words and phrases over the centuries, J. R. Soc. Interface, 12, 1-6 (2012)
[4] Park, J.; Barash, V.; Fink, C.; Cha, M., Emoticon style: interpreting differences in emoticons across cultures, Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM), 466-475 (2013)
[5] Tsytsarau, M.; Palpanas, T., survey on mining subjective data on the web, Ingegneria e Scienza dell’Informazione (2010), University of Trento, Trento, Relatório Técnico DISI-10-045 · Zbl 1235.68039
[7] Tang, H.; Tan, S.; Cheng, X., A survey on sentiment detection of reviews, J. Exp. Syst. Appl. Int. J. Arch., 36, 10760-10773 (2009)
[8] Agarwal, A.; Xie, B.; Vovsha, I.; Rambow, O.; Passonneau, R., Sentiment analysis of Twitter data, (Proceedings of the Workshop on Languages in Social Media (2011)), 30-38
[9] Bhuta, S.; Doshi, A.; Doshi, U.; Narvekar, M., A review of techniques for sentiment analysis of Twitter data, (Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014 International Conference on. Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014 International Conference on, Ghaziabad, India (2014)), 583-591
[10] Earle, P. S.; Bowden, D. C.; Guy, M., Twitter earthquake detection: earthquake monitoring in a social world, Ann. Geophys, 54, 708-715 (2011)
[11] Abel, F.; Hauff, C.; Houben, G.-J.; Stronkman, R.; Tao, K., Twitcident: fighting fire with information from social web streams, (Proceedings of the 21st International Conference Companion on World Wide Web (2012)), 305-308
[12] Yoshida, M.; Matsushima, S.; Ono, S.; Sato, I.; Nakagawa, H., ITC-UT: tweet categorization by query categorization of on-line reputation management, (Conference on Multilingual and Multimodal Information Access Evaluation (2010))
[13] Tumasjan, A.; Sprenger, T. O.; Sandner, P. G.; Welpe, I. M., Predicting elections with Twitter: what 140 characters reveal about political sentiment, (Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (2010), AAAI)
[14] Bermingham, A.; Smeaton, A., On using Twitter to monitor political sentiment and predict election results, (Proceedings of the Sentiment Analysis Where AI Meets Psychology (2011)), 2-10
[15] Bollen, J.; Mao, H.; Zeng, X., Twitter mood predicts the stock market, J. Comput. Sci., 2, 1-8 (2011)
[16] Botta, F.; Moat, H. S.; Preis, T., Quantifying crowd size with mobile phone and Twitter data, R. Soc. Open Sci., 2, 1-6 (2015)
[17] Alis, C. M., Quantifying regional differences in the length of Twitter messages, PLoS One, 10, 1-10 (2015)
[18] Salustiano, S., O Profissional Analista, Para entender o Monitoramento de Mídias Sociais, 34-40 (2012)
[20] Pang, B.; Lee, L., Opinion mining and sentiment analysis, Found. Trends Inform. Retriev., 2, 1-135 (2008)
[21] Filho, N. R.S., Monitoramento das redes sociais como forma de relacionamento com o consumidor. O que as empresas estão fazendo?, Gestão Contemp., 1, 63-86 (2011)
[22] Yu, H.; Hatzivassiloglou, V., Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences, (Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (2003)), 129-136
[23] Yessenalina, A.; Yue, Y.; Cardie, C., Multi-level structured models for document-level sentiment classification, (Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA (2010)), 1046-1056
[24] Wilson, T.; Wiebe, J.; Hoffmann, P., Recognizing contextual polarity in phrase-level sentiment analysis, (Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada (2005)), 347-354
[25] Sayeed, A. B.; Boyd-Graber, J.; Rusk, B.; Weinberg, A., Grammatical structures for word-level sentiment detection, (Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, Canada (2012)), 667-676
[26] Wang, X.; Wei, F.; Liu, X.; Zhou, M.; Zhang, M., Topic sentiment analysis in Twitter: a graph-based hashtag sentiment classification approach, (Proceedings of the 20th ACM International Conference on Information and Knowledge Management. Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, Scotland, UK (2011)), 1031-1040
[27] Kumar, A.; Sebastian, T. M., Sentiment analysis: a perspective on its past, present and future, Intell. Syst. Appl., 4, 1-14 (2012)
[28] Diakopoulos, N. A.; Shamma, D. A., Characterizing debate performance via aggregated Twitter sentiment, (Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Atlanta, GA, USA (2010)), 1195-1198
[29] Thelwall, M.; Buckley, K.; Paltoglou, G.; Cai, D.; Kappas, A., Sentiment strength detection in short informal text, J. Am. Soc. Inform. Sci. Technol., 62, 2544-2558 (2010)
[30] Santos, L. M., Protótipo para mineração de opinião em redes sociais: estudo de casos selecionados usando o Twitter (2010), Universidade Federal de Lavras, Lavras - MG: Universidade Federal de Lavras, Lavras - MG Monografia (Graduação em Ciência da Computação)
[31] Wiebe, J.; Wilson, T.; Bruce, R.; Bell, M.; Martin, M., Learning subjective language, Comput. Linguist., 30, 277-308 (2004)
[32] Manning, C. D.; Raghavan, P.; Schütze, H., Introduction to Information Retrieval (2008), Cambridge University Press · Zbl 1160.68008
[33] Kumar, A.; Sebastian, T. M., Machine learning assisted sentiment analysis, (Proceedings of International Conference on Computer Science & Engineering (2012)), 123-130
[34] Tausczik, Y. R.; Pennebaker, J. W., The psychological meaning of words: Liwc and computerized text analysis methods, J. Lang. Social Psychol., 29, 24-54 (2010)
[35] Esuli, A.; Sebastiani, F., SentiWordNet: a publicly available lexical resource for opinion mining, (Conference on Language Resources and Evaluation (2006)), 417-422
[36] Annett, M.; Kondrak, G., A comparison of sentiment analysis techniques: polarizing movie blogs, (Proceedings of the Canadian Society for Computational Studies of Intelligence, 21st Conference on Advances in Artificial Intelligence (2008)), 25-35
[37] Go, A.; Bhayani, R.; Huang, L., Twitter Sentiment Classification using Distant Supervision” Technical report (2009), Stanford Digital Library Technologies Project
[38] Lake, T., Twitter Sentiment Analysis (2011), Western Michigan University, For client William Fitzgerald: Western Michigan University, For client William Fitzgerald Kalamazoo, MI
[39] Bollen, J.; Mao, H.; Pepe, A., Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena, (Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (2011)), 450-453
[40] B. Pang; Lee, L.; Vaithyanathan, S., Thumbs up?: sentiment classification using machine learning techniques, (Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 10 (2002)), 79-86
[41] Prabowo, R.; Thelwall, M., Sentiment analysis: a combined approach, J. Informetrics, 3, 143-157 (2009)
[42] Read, J., Using emoticons to reduce dependency in machine learning techniques for sentiment classification, (Proceedings of the ACL Student Research Workshop (2005)), 43-48
[43] Araújo, M.; Gonçalves, P.; Benevenuto, F., Métodos para análise de sentimentos no Twitter, (Proceedings of the Simpósio Brasileiro de Sistemas Multimídia e Web (WEBMEDIA). Proceedings of the Simpósio Brasileiro de Sistemas Multimídia e Web (WEBMEDIA), Salvador (2013))
[44] Bakliwal, A., Mining sentiments from Tweets, (Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis. Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, Jeju, Republic of Korea (2012)), 11-18
[45] da Silva, N. F.F.; Hruschka, E. R.; Hruschka, E. R., Tweet sentiment analysis with classifier ensembles, Decis. Support Syst., 66, 170-179 (2014)
[46] Khan, F. H.; Bashir, S.; Qamar, U., TOM: Twitter opinion mining framework using hybrid classification scheme, Decis. Support Syst., 57, 245-257 (2014)
[48] Watson, D.; Clark, L. A., THE PANAS-X Manual for the Positive and Negative Affect Schedule - Expanded Form (1994), The University of Iowa: The University of Iowa Iowa
[49] Cambria, E.; Hussain, A., Sentic Computing: Techniques, Tools, and Applications (2012), Springer: Springer Dordrecht, Netherlands
[50] Witten, I. H., Text Mining. In Practical Handbook of Internet Computing (2005), Chapman & Hall/CRC Press: Chapman & Hall/CRC Press Florida
[51] Pennebaker, J. W.; Francis, M. E., Linguistic Inquiry and Word Count (1999), Lawrence Erlbaum
[52] Pennebaker, J. W.; Booth, R. J.; Francis, M. E., Linguistic Inquiry and Word Count: LIWC2007 - Operator’s Manual (2007), LIWC.net: LIWC.net Austin, Texas
[53] Toutanova, K.; Manning, C. D., Enriching the knowledge sources used in a maximum entropy part-of-speech Tagger, (Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13. Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13, Hong Kong (2000)), 63-70
[54] Atwell, E. S.; Hughes, J.; Souter, C., AMALGAM: automatic mapping among lexico-grammatical annotation models, (Workshop On The Balancing Act: Combining Symbolic And Statistical Approaches To Language (1994))
[55] Fellbaum, C., WordNet: An Electronic Lexical Database (1998), MIT Press: MIT Press Cambridge, MA · Zbl 0913.68054
[56] Shamma, D. A.; Kennedy, L.; Churchill, E. F., Tweet the debates: understanding community annotation of uncollected sources, (Proceedings of the First SIGMM Workshop on Social Media. Proceedings of the First SIGMM Workshop on Social Media, Beijing, China (2009)), 3-10
[57] Witten, I. H.; Frank, E.; Hall, M. A., Data Mining: Practical Machine Learning Tools and Techniques (2011), Morgan Kaufmann
[58] Witten, I. H., Text mining. In Practical Handbook of Internet Computing (2005), Chapman & Hall/CRC Press: Chapman & Hall/CRC Press Florida
[59] Han, J.; Kamber, M.; Jian Pei, Data Mining: Concepts and Techniques (2011), Morgan Kaufmann Publishers
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.