This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Topic-Sensitive PageRank Taher H. Haveliwala * Stanford University Computer Science Department Stanford, CA 94305 [email protected] (650) 723-9273 ABSTRACT In the original PageRank algorithm for improving the rank- ing of search-query results, a single PageRank vector is com- puted, using the link structure of the Web, to capture the relative “importance” of Web pages, independent of any par- ticular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased us- ing a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate query-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordi- nary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive Page- Rank scores using the topic of the context in which the query appeared. Categories and Subject Descriptors H.3.3 [ Information Storage and Retrieval ]: Informa- tion Search and Retrieval— search process, information fil- tering, retrieval models ; H.3.1 [ Information Storage and Retrieval ]: Content Analysis and Indexing— linguistic pro- cessing General Terms Algorithms, Experimentation Keywords search, Web graph, link structure, PageRank, search in con- text, personalized search 1. INTRODUCTION Various link-based ranking strategies have been developed recently for improving Web-search query results. The HITS * Supported by NSF Grant IIS-0085896 and an NSF Grad- uate Research Fellowship. Copyright is held by the author/owner(s). WWW2002 , May 7–11, 2002, Honolulu, Hawaii, USA. ACM 1-58113-449-5/02/0005. algorithm proposed in  relies on query-time processing to deduce the hubs and authorities that exist in a subgraph of the Web consisting of both the results to a query and the local neighborhood of these results.  augments the HITS algorithm with content analysis to improve precision for the task of retrieving documents related to a query topic (as op- posed to retrieving documents that exactly satisfy the user’s information need).  makes use of HITS for automatically compiling resource lists for general topics. The PageRank algorithm discussed in [7, 16] precomputes a rank vector that provides a-priori “importance” estimates for all of the pages on the Web. This vector is computed once, offline, and is independent of the search query. At query time, these importance scores are used in conjunc- tion with query-specific IR scores to rank the query results....
View Full Document
This note was uploaded on 08/06/2008 for the course CSE 450 taught by Professor Davison during the Spring '08 term at Lehigh University .
- Spring '08
- Computer Science