This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering Paolo Ferragina Dipartimento di Informatica, Pisa firstname.lastname@example.org Antonio Gulli Dipartimento di Informatica, Pisa email@example.com ABSTRACT In this paper we propose a hierarchical clustering engine, called SnakeT , that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hi- erarchy of labeled folders. The hierarchy offers a comple- mentary view to the flat-ranked list of results returned by current search engines. Users can navigate through the hier- archy driven by their search needs. This is especially useful for informative, polysemous and poor queries. SnakeT is the first complete and open-source system in the literature that offers both hierarchical clustering and folder labeling with variable-length sentences. We exten- sively test SnakeT against all available web-snippet cluster- ing engines, and show that it achieves efficiency and efficacy performance close to the best known engine Vivisimo.com . Recently, personalized search engines have been intro- duced with the aim of improving search results by focusing on the users, rather than on their submitted queries. We show how to plug SnakeT on top of any (un-personalized) search engine in order to obtain a form of personalization that is fully adaptive, privacy preserving, scalable, and non intrusive for underlying search engines. SnakeT is available at http://snaket.di.unipi.it/ . Categories and Subject Descriptors H.3 [ Information Storage And Retrieval ]: Content Analysis and Indexing, Information Search and Retrieval, Online Information Services; I.5.3 [ Text Processing ]: Clustering General Terms Algorithms, Design, Experimentation, Measurement Keywords Web Snippets Clustering, Search Engines, Information Ex- traction, New Search Applications and Interfaces, Personal- ized Web Ranking 1. INTRODUCTION Web-snippet clustering is an innovative approach to help users in searching the web . It consists of clustering the Copyright is held by the International World Wide Web Conference Com- mittee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2005 , May 10-14, 2005, Chiba, Japan. ACM 1-59593-051-5/05/0005. snippets 1 returned by a (meta-)search engine into a hier- archy of folders which are labeled with variable-length sen- tences. The labels should capture the theme of the snip- pets (and thus, of the corresponding web pages) contained into their associated folders. This labeled hierarchy offers a complementary view to the flat-ranked list of results re- turned by current search engines. Users can exploit it by navigating through the hierarchy of labeled folders, driven by their search needs. This technique is useful for informa- tive , polysemous or poor queries....
View Full Document
This note was uploaded on 02/10/2012 for the course CSE 5800 taught by Professor Staff during the Fall '09 term at FIT.
- Fall '09
- The Bible