Geraci, Filippo; Pellegrini, Marco; Maggini, Marco; Sebastiani, Fabrizio Cluster generation and labeling for web snippets: a fast, accurate hierarchical solution. (English) Zbl 1147.68349 Internet Math. 3, No. 4, 413-443 (2006). Summary: This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the fly by processing only the snippets provided by the auxil- iary search engines, and use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-point-first algorithm for metric \(k\)-center clustering. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Directory Project hierarchy. According to two widely accepted “external” metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. On a standard 1GHz machine, Armil performs clustering and labelling altogether in less than one second. MSC: 68M10 Network design and communication in computer systems 68N99 Theory of software Keywords:Armil; Vivisimo; Web snippet clustering PDF BibTeX XML Cite \textit{F. Geraci} et al., Internet Math. 3, No. 4, 413--443 (2006; Zbl 1147.68349) Full Text: DOI OpenURL