×

Latent tree models for hierarchical topic detection. (English) Zbl 1419.68087

Summary: We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables that represent word co-occurrence patterns or co-occurrences of such patterns. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. In comparison with LDA-based methods, a key advantage of the new method is that it represents co-occurrence patterns explicitly using model structures. Extensive empirical results show that the new method significantly outperforms the LDA-based methods in term of model quality and meaningfulness of topics and topic hierarchies.

MSC:

68T10 Pattern recognition, speech recognition
68T50 Natural language processing

Software:

PMTK; word2vec
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Blei, D. M.; Griffiths, T.; Jordan, M.; Tenenbaum, J., Hierarchical topic models and the nested Chinese restaurant process, (NIPS, vol. 16 (2004)), 106-114
[2] Blei, D. M.; Griffiths, T.; Jordan, M., The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies, J. ACM, 57, 2, 7:1-7:30 (2010) · Zbl 1327.68187
[3] Li, W.; McCallum, A., Pachinko Allocation: DAG-structured mixture models of topic correlations, (ICML (2006)), 577-584
[4] Mimno, D.; Li, W.; McCallum, A., Mixtures of hierarchical topics with pachinko allocation, (ICML (2007)), 633-640
[5] Paisley, J.; Wang, C.; Blei, D. M.; Jordan, M., Nested hierarchical Dirichlet processes, IEEE Trans. Pattern Anal. Mach. Intell., 37, 2, 256-270 (2015)
[6] Blei, D. M.; Ng, A. Y.; Jordan, M. I., Latent Dirichlet allocation, J. Mach. Learn. Res., 3, 993-1022 (2003) · Zbl 1112.68379
[7] Lafferty, J.; Blei, D. M., Correlated topic models, (NIPS (2006)), 147-155
[8] Blei, D. M.; Lafferty, J. D., Dynamic topic models, (ICML (2006))
[9] Wang, X.; McCallum, A., Topics over time: a non-Markov continuous-time model of topical trends, (SIGKDD (2006)), 424-433
[10] Ahmed, A.; Xing, E. P., Dynamic non-parametric mixture models and the recurrent Chinese restaurant process, (ICDM (2007))
[11] Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M., Hierarchical Dirichlet processes, J. Am. Stat. Assoc., 101, 476 (2005) · Zbl 1171.62349
[12] Andrzejewski, D.; Zhu, X.; Craven, M., Incorporating domain knowledge into topic modeling via Dirichlet forest priors, (ICML (2009)), 25-32
[13] Jagarlamudi, J.; Daumé, H.; Udupa, R., Incorporating lexical priors into topic models, ((2012), Association for Computational Linguistics), 204-213
[14] Mcauliffe, J. D.; Blei, D. M., Supervised topic models, (NIPS (2008)), 121-128
[15] Perotte, A. J.; Wood, F.; Elhadad, N.; Bartlett, N., Hierarchically supervised latent Dirichlet allocation, (NIPS (2011)), 2609-2617
[16] Zhang, N. L., Hierarchical latent class models for cluster analysis, (Proceedings of the 18th National Conference on Artificial Intelligence (2002))
[17] Zhang, N. L., Hierarchical latent class models for cluster analysis, J. Mach. Learn. Res., 5, 697-723 (2004) · Zbl 1222.68343
[18] Zhang, N. L.; Yuan, S.; Chen, T.; Wang, Y., Latent tree models and diagnosis in traditional Chinese medicine, Artif. Intell. Med., 42, 3, 229-245 (2008)
[19] Wang, Y.; Zhang, N. L.; Chen, T., Latent tree models and approximate inference in Bayesian networks, (AAAI (2008)) · Zbl 1183.68621
[20] Lazarsfeld, P. F.; Henry, N. W., Latent Structure Analysis (1968), Houghton Mifflin: Houghton Mifflin Boston · Zbl 0182.52201
[21] Knott, M.; Bartholomew, D. J., Latent Variable Models and Factor Analysis, vol. 7 (1999), Edward Arnold · Zbl 1066.62528
[22] Durbin, R.; Eddy, S. R.; Krogh, A.; Mitchison, G., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (1998), Cambridge University Press · Zbl 0929.92010
[23] Mourad, R.; Sinoquet, C.; Zhang, N. L.; Liu, T.; Leray, P., A survey on latent tree models and applications, J. Artif. Intell. Res., 47, 157-203 (2013) · Zbl 1270.68097
[24] Choi, N. J.; Tan, V. Y.F.; Anandkumar, A.; Willsky, A. S., Learning latent tree graphical models, J. Mach. Learn. Res., 12, 1771-1812 (2011) · Zbl 1280.68160
[25] Chen, T.; Zhang, N. L.; Liu, T.; Poon, K. M.; Wang, Y., Model-based multidimensional clustering of categorical data, Artif. Intell., 176, 2246-2269 (2012) · Zbl 1238.68119
[26] Liu, T.; Zhang, N. L.; Chen, P.; Liu, A. H.; Poon, K. M.; Wang, Y., Greedy learning of latent tree models for multidimensional clustering, Mach. Learn., 98, 1-2, 301-330 (2013) · Zbl 1321.68408
[27] Liu, T.; Zhang, N. L.; Chen, P., Hierarchical latent tree analysis for topic detection, (ECML/PKDD (2014), Springer), 256-272
[28] Chen, P.; Zhang, N. L.; Poon, L. K.; Chen, Z., Progressive EM for latent tree models and hierarchical topic detection, (AAAI (2016))
[29] Ver Steeg, G.; Galstyan, A., Discovering structure in high-dimensional data through correlation explanation, (NIPS, vol. 27 (2014)), 577-585
[30] Steinbach, M.; Karypis, G.; Kumar, V., A comparison of document clustering techniques, (KDD Workshop on Text Mining. KDD Workshop on Text Mining, Boston, vol. 400 (2000)), 525-526
[31] Dhillon, I. S., Co-clustering documents and words using bipartite spectral graph partitioning, (SIGKDD (2001), ACM), 269-274
[32] Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1988), Morgan Kaufmann
[33] Poon, L.; Zhang, N. L.; Chen, T.; Wang, Y., Variable selection in model-based clustering: to do or to facilitate, (ICML-10 (2010)), 887-894
[34] Kirshner, S., Latent tree copulas, (European Workshop on Probabilistic Graphical Models (2012))
[35] Liu, H.; Parikh, A.; Xing, E., Nonparametric latent tree graphical models: inference, estimation, and structure learning, J. Mach. Learn. Res., 12, 663-707 (2017)
[36] Cover, T. M.; Thomas, J. A., Elements of Information Theory (2006), Wiley · Zbl 1140.94001
[37] Dempster, A.; Laird, N.; Rubin, D., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, 39, 1, 1-38 (1977) · Zbl 0364.62022
[38] Schwarz, G., Estimating the dimension of a model, Ann. Stat., 6, 461-464 (1978) · Zbl 0379.62005
[39] Harmeling, S.; Williams, C. K.I., Greedy learning of binary latent trees, IEEE Trans. Pattern Anal. Mach. Intell., 33, 6, 1087-1097 (2011)
[40] Chow, C. K.; Liu, C. N., Approximating discrete probability distributions with dependence trees, IEEE Trans. Inf. Theory, 14, 3, 462-467 (1968) · Zbl 0165.22305
[41] Raftery, A. E., Bayesian model selection in social research, Sociol. Method., 25, 111-163 (1995)
[42] Chow, C. K.; Liu, C. N., Approximating discrete probability distributions with dependence trees, IEEE Trans. Inf. Theory, 14, 3, 462-467 (1968) · Zbl 0165.22305
[43] Murphy, K. P., Machine Learning: A Probabilistic Perspective (2012), The MIT Press · Zbl 1295.68003
[44] Sato, M.-A.; Ishii, S., On-line EM algorithm for the normalized Gaussian network, Neural Comput., 12, 2, 407-432 (2000) · Zbl 1473.68164
[45] Cappé, O.; Moulines, E., On-line expectation-maximization algorithm for latent data models, J. R. Stat. Soc., Ser. B, Stat. Methodol., 71, 3, 593-613 (2009) · Zbl 1250.62015
[46] Liang, P.; Klein, D., Online EM for unsupervised models, (Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2009)), 611-619
[47] Bousquet, O.; Bottou, L., The tradeoffs of large scale learning, (NIPS (2008)), 161-168
[48] Chowdhury, G. G., Introduction to Modern Information Retrieval (2010), Facet Publishing
[49] Choueka, Y., Looking for needles in a haystack or locating interesting collocational expressions in large textual databases, (RIAO 88 (Recherche d’Information Assistée par Ordinateur) Conference (1988)), 609-623
[50] Lau, J. H.; Baldwin, T.; Newman, D., On collocations and topic models, ACM Trans. Speech Lang. Process. (TSLP), 10, 3, 10 (2013)
[51] Poon, L. K.; Zhang, N. L., Topic browsing for research papers with hierarchical latent tree analysis, preprint
[52] Chang, J.; Gerrish, S.; Wang, C.; Boyd-graber, J. L.; Blei, D. M., Reading tea leaves: how humans interpret topic models, (NIPS (2009)), 288-296
[53] Mimno, D.; Wallach, H. M.; Talley, E.; Leenders, M.; McCallum, A., Optimizing semantic coherence in topic models, (Proceedings of the Conference on Empirical Methods in Natural Language Processing (2011)), 262-272
[54] Chen, Z.; Zhang, N.; Yeung, D.-Y.; Chen, P., Sparse Boltzmann machines with structure learning as applied to text analysis, (AAAI (2017))
[55] Mikolov, T.; Chen, K.; Corrado, G.; Dean, J., Efficient estimation of word representations in vector space, (ICLR (2013))
[56] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; Dean, J., Distributed representations of words and phrases and their compositionality, (NIPS (2013)), 3111-3119
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.