×

zbMATH — the first resource for mathematics

Nonparametric Bayesian topic modelling with the hierarchical Pitman-Yor processes. (English) Zbl 06639853
Summary: The Dirichlet process and its extension, the Pitman-Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In particular, we propose a general framework for designing these Bayesian models, which are called topic models in the computer science community. We then propose a specific nonparametric Bayesian topic model for modelling text from social media. We focus on tweets (posts on Twitter) in this article due to their ease of access. We find that our nonparametric model performs better than existing parametric models in both goodness of fit and real world applications.

MSC:
62G05 Nonparametric estimation
60G20 Generalized stochastic processes
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aoki, M., Thermodynamic limits of macroeconomic or financial models: one- and two-parameter Poisson-Dirichlet models, J. Econ. Dyn. Control, 32, 1, 66-84, (2008) · Zbl 1181.91222
[2] Archambeau, C.; Lakshminarayanan, B.; Bouchard, G., Latent IBP compound Dirichlet allocation, IEEE Trans. Pattern Anal. Mach. Intell., 37, 2, 321-333, (2015)
[3] Aula, P., Social media, reputation risk and ambient publicity management, Strategy Leadersh., 38, 6, 43-49, (2010)
[4] Baldwin, T.; Cook, P.; Lui, M.; MacKinlay, A.; Wang, L., How noisy social media text, how diffrnt [sic] social media sources?, (Proceedings of the Sixth International Joint Conference on Natural Language Processing, IJCNLP 2013, (2013), Asian Federation of Natural Language Processing Nagoya, Japan), 356-364
[5] Blei, D. M., Probabilistic topic models, Commun. ACM, 55, 4, 77-84, (2012)
[6] Blei, D. M.; Griffiths, T. L.; Jordan, M. I., The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies, J. ACM, 57, 2, 7:1-7:30, (2010) · Zbl 1327.68187
[7] Blei, D. M.; Jordan, M. I., Modeling annotated data, (Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, (2003), ACM New York, NY, USA), 127-134
[8] Blei, D. M.; Lafferty, J. D., Dynamic topic models, (Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, (2006), ACM New York, NY, USA), 113-120
[9] Blei, D. M.; Ng, A. Y.; Jordan, M. I., Latent Dirichlet allocation, J. Mach. Learn. Res., 3, 993-1022, (2003) · Zbl 1112.68379
[10] Broersma, M.; Graham, T., Social media as beat, Journalism Practice, 6, 3, 403-419, (2012)
[11] Bryant, M.; Sudderth, E. B., Truly nonparametric online variational inference for hierarchical Dirichlet processes, (Advances in Neural Information Processing Systems, vol. 25, (2012), Curran Associates Rostrevor, Northern Ireland), 2699-2707
[12] Buntine, W. L.; Hutter, M., A Bayesian view of the Poisson-Dirichlet process, (2012)
[13] Buntine, W. L.; Mishra, S., Experiments with non-parametric topic models, (Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, (2014), ACM New York, NY, USA), 881-890
[14] Canny, J., Gap: a factor model for discrete data, (Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2004, (2004), ACM New York, NY, USA), 122-129
[15] Chen, C.; Du, L.; Buntine, W. L., Sampling table configurations for the hierarchical Poisson-Dirichlet process, (Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I, ECML 2011, (2011), Springer-Verlag Berlin, Heidelberg), 296-311
[16] Correa, T.; Hinsley, A. W.; de Zúñiga, H. G., Who interacts on the web?: the intersection of users’ personality and social media use, Comput. Hum. Behav., 26, 2, 247-253, (2010)
[17] Du, L., Non-parametric Bayesian methods for structured topic models, (2012), The Australian National University Canberra, Australia, PhD thesis
[18] Du, L.; Buntine, W. L.; Jin, H., Modelling sequential text with an adaptive topic model, (Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, (2012), ACL Stroudsburg, PA, USA), 535-545
[19] Du, L.; Buntine, W. L.; Jin, H.; Chen, C., Sequential latent Dirichlet allocation, Knowl. Inf. Syst., 31, 3, 475-503, (2012)
[20] Eisenstein, J., What to do about bad language on the Internet, (Proceedings of the 2013 Conference of the North American Chapter of the ACL: Human Language Technologies, NAACL-HLT 2013, (2013), ACL Stroudsburg, PA, USA), 359-369
[21] Erosheva, E. A.; Fienberg, S. E., Bayesian mixed membership models for soft clustering and classification, 11-26, (2005), Springer Berlin, Heidelberg
[22] Favaro, S.; Lijoi, A.; Mena, R. H.; Prünster, I., Bayesian non-parametric inference for species variety with a two-parameter Poisson-Dirichlet process prior, J. R. Stat. Soc., Ser. B, Stat. Methodol., 71, 5, 993-1008, (2009)
[23] Ferguson, T. S., A Bayesian analysis of some nonparametric problems, Ann. Stat., 1, 2, 209-230, (1973) · Zbl 0255.62037
[24] Gelman, A.; Carlin, J. B.; Stern, H. S.; Dunson, D. B.; Vehtari, A.; Rubin, D. B., Bayesian data analysis, Chapman & Hall/CRC Texts in Statistical Science, (2013), CRC Press Boca Raton, FL, USA
[25] Geman, S.; Geman, D., Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., PAMI-6, 6, 721-741, (1984) · Zbl 0573.62030
[26] Goldwater, S.; Griffiths, T. L.; Johnson, M., Interpolating between types and tokens by estimating power-law generators, (Advances in Neural Information Processing Systems 18, NIPS 2005, (2005), MIT Press Cambridge, MA, USA), 459-466
[27] Goldwater, S.; Griffiths, T. L.; Johnson, M., Producing power-law distributions and damping word frequencies with two-stage language models, J. Mach. Learn. Res., 12, 2335-2382, (2011) · Zbl 1280.62037
[28] Green, P. J., Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, 82, 4, 711-732, (1995) · Zbl 0861.62023
[29] Green, P. J.; Mira, A., Delayed rejection in reversible jump metropolis-Hastings, Biometrika, 88, 4, 1035-1053, (2001) · Zbl 1099.60508
[30] Hastings, W. K., Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57, 1, 97-109, (1970) · Zbl 0219.65008
[31] He, Y., Incorporating sentiment prior knowledge for weakly supervised sentiment analysis, ACM Trans. Asian Lang. Inf. Process., 11, 2, 4:1-4:19, (2012)
[32] Hjort, N. L.; Holmes, C.; Müller, P.; Walker, S. G., Bayesian nonparametrics, vol. 28, (2010), Cambridge University Press Cambridge, England
[33] Hoffman, M. D.; Blei, D. M.; Wang, C.; Paisley, J., Stochastic variational inference, J. Mach. Learn. Res., 14, 1303-1347, (2013) · Zbl 1317.68163
[34] Hofmann, T., Probabilistic latent semantic indexing, (Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, (1999), ACM New York, NY, USA), 50-57
[35] Ishwaran, H.; James, L. F., Gibbs sampling methods for stick-breaking priors, J. Am. Stat. Assoc., 96, 453, 161-173, (2001) · Zbl 1014.62006
[36] Jelinek, F., Statistical methods for speech recognition, (1997), MIT Press Cambridge, MA, USA
[37] Jin, O.; Liu, N. N.; Zhao, K.; Yu, Y.; Yang, Q., Transferring topical knowledge from auxiliary long texts for short text clustering, (Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, (2011), ACM New York, NY, USA), 775-784
[38] Jurafsky, D.; Martin, J. H., Speech & language processing, (2000), Prentice-Hall Upper Saddle River, NJ, USA
[39] Karimi, S.; Yin, J.; Paris, C., Classifying microblogs for disasters, (Proceedings of the 18th Australasian Document Computing Symposium, ADCS 2013, (2013), ACM New York, NY, USA), 26-33
[40] Kataria, S.; Mitra, P.; Caragea, C.; Giles, C. L., Context sensitive topic models for author influence in document networks, (Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 3, (2011), AAAI Press Palo Alto, CA, USA), 2274-2280
[41] Kim, D.; Kim, S.; Oh, A., Dirichlet process with mixed random measures: a nonparametric topic model for labeled data, (Proceedings of the 29th International Conference on Machine Learning, ICML 2012, (2012), Omnipress New York, NY, USA), 727-734
[42] Kinsella, S.; Murdock, V.; O’Hare, N., “I’m eating a sandwich in glasgow”: modeling locations with tweets, (Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, SMUC 2011, (2011), ACM New York, NY, USA), 61-68
[43] Kwak, H.; Lee, C.; Park, H.; Moon, S., What is twitter, a social network or a news media?, (Proceedings of the 19th International Conference on World Wide Web, WWW 2010, (2010), ACM New York, NY, USA), 591-600
[44] Landauer, T. K.; McNamara, D. S.; Dennis, S.; Kintsch, W., Handbook of latent semantic analysis, (2007), Lawrence Erlbaum Mahwah, NJ, USA
[45] Lau, J. H.; Grieser, K.; Newman, D.; Baldwin, T., Automatic labelling of topic models, (Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies, ACL-HLT 2011, vol. 1, (2011), ACL Stroudsburg, PA, USA), 1536-1545
[46] Lim, K. W., Nonparametric Bayesian topic modelling with auxiliary data, (2016), The Australian National University Canberra, Australia, PhD thesis
[47] Lim, K. W.; Buntine, W. L., Twitter opinion topic model: extracting product opinions from tweets by leveraging hashtags and sentiment lexicon, (Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014, (2014), ACM New York, NY, USA), 1319-1328
[48] Lim, K. W.; Chen, C.; Buntine, W. L., Twitter-network topic model: a full Bayesian treatment for social network and text modeling, (Advances in Neural Information Processing Systems: Topic Models Workshop, NIPS Workshop 2013, Lake Tahoe, Nevada, USA, (2013)), 1-5
[49] Lindsey, R. V.; Headden, W. P.; Stipicevic, M. J., A phrase-discovering topic model using hierarchical Pitman-Yor processes, (Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, (2012), ACL Stroudsburg, PA, USA), 214-222
[50] Liu, B., Sentiment analysis and opinion mining, Synth. Lect. Hum. Lang. Technol., 5, 1, 1-167, (2012)
[51] Liu, J. S., The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem, J. Am. Stat. Assoc., 89, 427, 958-966, (1994) · Zbl 0804.62033
[52] Lloret, E.; Palomar, M., Text summarisation in progress: a literature review, Artif. Intell. Rev., 37, 1, 1-41, (2012)
[53] Lloyd, J.; Orbanz, P.; Ghahramani, Z.; Roy, D. M., Random function priors for exchangeable arrays with applications to graphs and relational data, (Advances in Neural Information Processing Systems 25, NIPS 2012, (2012), Curran Associates Rostrevor, Northern Ireland), 998-1006
[54] Low, A. A., Introductory computer vision and image processing, (1991), McGraw-Hill New York, NY, USA
[55] Lui, M.; Baldwin, T., Langid.py: an off-the-shelf language identification tool, (Proceedings of the ACL 2012 System Demonstrations, ACL 2012, (2012), ACL Stroudsburg, PA, USA), 25-30
[56] Lunn, D. J.; Thomas, A.; Best, N.; Spiegelhalter, D., Winbugs - a Bayesian modelling framework: concepts, structure, and extensibility, Stat. Comput., 10, 4, 325-337, (2000)
[57] Mai, L. C., Introduction to image processing and computer vision, (2010), Institute of Information Technology Hanoi, Vietnam, Technical report
[58] Manning, C. D.; Raghavan, P.; Schütze, H., Introduction to information retrieval, (2008), Cambridge University Press New York, NY, USA · Zbl 1160.68008
[59] Manning, C. D.; Schütze, H., Foundations of statistical natural language processing, (1999), MIT Press Cambridge, MA, USA · Zbl 0951.68158
[60] Maynard, D.; Bontcheva, K.; Rout, D., Challenges in developing opinion mining tools for social media, (Proceedings of @NLP can u tag #user_generated_content, LREC Workshop 2012, Istanbul, Turkey, (2012)), 15-22
[61] Mehdad, Y.; Carenini, G.; Ng, R. T.; Joty, S. R., Towards topic labeling with phrase entailment and aggregation, (Proceedings of the 2013 Conference of the North American Chapter of the ACL: Human Language Technologies, NAACL-HLT 2013, (2013), ACL Stroudsburg, PA, USA), 179-189
[62] Mehrotra, R.; Sanner, S.; Buntine, W. L.; Xie, L., Improving LDA topic models for microblogs via tweet pooling and automatic labeling, (Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, (2013), ACM New York, NY, USA), 889-892
[63] Mei, Q.; Ling, X.; Wondra, M.; Su, H.; Zhai, C., Topic sentiment mixture: modeling facets and opinions in weblogs, (Proceedings of the 16th International Conference on World Wide Web, WWW 2007, (2007), ACM New York, NY, USA), 171-180
[64] Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E., Equation of state calculations by fast computing machines, J. Chem. Phys., 21, 6, 1087-1092, (1953)
[65] Mira, A., On metropolis-Hastings algorithms with delayed rejection, Metron - Int. J. Stat., LIX, 3-4, 231-241, (2001) · Zbl 0998.65502
[66] Murray, I.; Adams, R. P.; MacKay, D. J.C., Elliptical slice sampling, (Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, (2010), Microtome Publishing Brookline, MA, USA), 541-548
[67] Oldham, K. B.; Myland, J.; Spanier, J., An atlas of functions: with equator, the atlas function calculator, (2009), Springer Science and Business Media New York, NY, USA · Zbl 1167.65001
[68] Pang, B.; Lee, L., Opinion mining and sentiment analysis, Found. Trends Inf. Retr., 2, 1-2, 1-135, (2008)
[69] Pitman, J., Combinatorial stochastic processes, (2006), Springer-Verlag Berlin, Heidelberg
[70] Pitman, J.; Yor, M., The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator, Ann. Probab., 25, 2, 855-900, (1997) · Zbl 0880.60076
[71] Plummer, M., JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling, (Proceedings of the 3rd International Workshop on Distributed Statistical Computing, DSC 2003, Vienna, Austria, (2003))
[72] Rabiner, L.; Juang, B.-H., Fundamentals of speech recognition, (1993), Prentice-Hall Upper Saddle River, NJ, USA
[73] Ramage, D.; Hall, D.; Nallapati, R.; Manning, C. D., Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora, (Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 1, (2009), ACL Stroudsburg, PA, USA), 248-256
[74] Rosen-Zvi, M.; Griffiths, T.; Steyvers, M.; Smyth, P., The author-topic model for authors and documents, (Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, (2004), AUAI Press Arlington, Virginia, USA), 487-494
[75] Sato, I.; Kurihara, K.; Nakagawa, H., Practical collapsed variational Bayes inference for hierarchical Dirichlet process, (Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, (2012), ACM New York, NY, USA), 105-113
[76] Sato, I.; Nakagawa, H., Topic models with power-law using Pitman-Yor process, (Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, (2010), ACM New York, NY, USA), 673-682
[77] Schnober, C.; Gurevych, I., Combining topic models for corpus exploration: applying LDA for complex corpus research tasks in a digital humanities project, (Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications, TM 2015, (2015), ACM New York, NY, USA), 11-20
[78] Suominen, H.; Hanlen, L.; Paris, C., Twitter for health - building a social media search engine to better understand and curate laypersons’ personal experiences, (Text Mining of Web-Based Medical Content, (2014), De Gruyter Berlin, Germany), 133-174, chapter 6
[79] Teh, Y. W., A Bayesian interpretation of interpolated Kneser-ney, (2006), National University of Singapore, Technical Report TRA2/06
[80] Teh, Y. W.; Jordan, M. I., Hierarchical Bayesian nonparametric models with applications, (Bayesian Nonparametrics, (2010), Cambridge University Press), chapter 5
[81] Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M., Hierarchical Dirichlet processes, J. Am. Stat. Assoc., 101, 476, 1566-1581, (2006) · Zbl 1171.62349
[82] Teh, Y. W.; Kurihara, K.; Welling, M., Collapsed variational inference for HDP, (Advances in Neural Information Processing Systems, vol. 20, (2008), Curran Associates Rostrevor, Northern Ireland), 1481-1488
[83] Tu, Y.; Johri, N.; Roth, D.; Hockenmaier, J., Citation author topic model in expert search, (Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, (2010), ACL Stroudsburg, PA, USA), 1265-1273
[84] Walck, C., Handbook on statistical distributions for experimentalists, (2007), University of Stockholm Sweden, Technical Report SUF-PFY/96-01
[85] Wallach, H. M.; Mimno, D. M.; McCallum, A., Rethinking LDA: why priors matter, (Advances in Neural Information Processing Systems, NIPS 2009, (2009), Curran Associates Rostrevor, Northern Ireland), 1973-1981
[86] Wallach, H. M.; Murray, I.; Salakhutdinov, R.; Mimno, D., Evaluation methods for topic models, (Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, (2009), ACM New York, NY, USA), 1105-1112
[87] Wang, C.; Paisley, J.; Blei, D. M., Online variational inference for the hierarchical Dirichlet process, (Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, (2011), Microtome Publishing Brookline, MA, USA), 752-760
[88] Wang, X.; Wei, F.; Liu, X.; Zhou, M.; Zhang, M., Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach, (Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, (2011), ACM New York, NY, USA), 1031-1040
[89] Wei, X.; Croft, W. B., LDA-based document models for ad-hoc retrieval, (Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, (2006), ACM New York, NY, USA), 178-185
[90] Wood, F.; Teh, Y. W., A hierarchical nonparametric Bayesian approach to statistical language model domain adaptation, (Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, AISTATS 2009, (2009), Microtome Publishing Brookline, MA, USA), 607-614
[91] Yang, J.; Leskovec, J., Patterns of temporal variation in online media, (Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, (2011), ACM New York, NY, USA), 177-186
[92] Zhao, W. X.; Jiang, J.; Weng, J.; He, J.; Lim, E.-P.; Yan, H.; Li, X., Comparing twitter and traditional media using topic models, (Proceedings of the 33rd European Conference on Advances in Information Retrieval, ECIR 2011, (2011), Springer-Verlag Berlin, Heidelberg), 338-349
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.