Garza, Sara E.; Brena, Ramon F. Structural topic mining in web collections. (English) Zbl 06287331 Appl. Comput. Math. 11, No. 2, 271-285 (2012). Summary: This paper introduces structural topic mining: an approach for discovering anddescribing thematically related document groups in large document collections. A collection isviewed as a directed graph where vertices represent documents and arcs represent connectionsamong these. Because a document is likely to have more connections to documents of thesame theme, we have assumed that topics have the structure of a graph cluster, i.e. a group ofvertices with more arcs to the inside of the group and fewer arcs to the outside. So, topics couldbe discovered by clustering the document graph; a local approach is used for scalability. Wealso extract properties (keywords and representative documents) from clusters. This approachwas tested over Wikipedia, and the resulting clusters in fact correspond to topics; this showsthat topic mining can be treated as a graph clustering problem. Comparative results suggestconsiderable quality at a low cost. MSC: 62H30 Classification and discrimination; cluster analysis (statistical aspects) 68W25 Approximation algorithms Keywords:topic mining; graph clustering; structure; wikipedia PDFBibTeX XMLCite \textit{S. E. Garza} and \textit{R. F. Brena}, Appl. Comput. Math. 11, No. 2, 271--285 (2012; Zbl 06287331) Full Text: Link