×

Structural topic mining in web collections. (English) Zbl 06287331

Summary: This paper introduces structural topic mining: an approach for discovering anddescribing thematically related document groups in large document collections. A collection isviewed as a directed graph where vertices represent documents and arcs represent connectionsamong these. Because a document is likely to have more connections to documents of thesame theme, we have assumed that topics have the structure of a graph cluster, i.e. a group ofvertices with more arcs to the inside of the group and fewer arcs to the outside. So, topics couldbe discovered by clustering the document graph; a local approach is used for scalability. Wealso extract properties (keywords and representative documents) from clusters. This approachwas tested over Wikipedia, and the resulting clusters in fact correspond to topics; this showsthat topic mining can be treated as a graph clustering problem. Comparative results suggestconsiderable quality at a low cost.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
68W25 Approximation algorithms
PDFBibTeX XMLCite
Full Text: Link