lmpagerank

lmpagerank - PageRank without hyperlinks: Structural...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
PageRank without hyperlinks: Structural re-ranking using links induced by language models Oren Kurland 1 , 3 kurland@cs.cornell.edu 1 . Computer Science Department, Cornell University, Ithaca NY 14853, U.S.A. 2 . Language Technologies Institute, Carnegie Mellon University, Pittsburgh PA 15213, U.S.A. 3 . Computer Science Department, Carnegie Mellon University, Pittsburgh PA 15213, U.S.A. Lillian Lee 1 , 2 , 3 llee@cs.cornell.edu ABSTRACT Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-rank- ing approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asym- metric relationships between them. SpeciFcally, we consider generation links , which indicate that the language model in- duced from one document assigns high probability to the text of another; in doing so, we take care to prevent bias against long documents. We study a number of re-ranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite e±ective at improving precision at top ranks. Categories and Subject Descriptors: H.3.3 [Informa- tion Search and Retrieval]: Retrieval models General Terms: Algorithms, Experimentation Keywords: language modeling, PageRank, HITS, hubs, authorities, social networks, high-accuracy retrieval, graph- based retrieval, structural re-ranking 1. INTRODUCTION Information retrieval systems capable of achieving high precision at the top ranks of the returned results would be of obvious beneFt to human users, and could also aid pseudo- feedback approaches, question-answering systems, and other applications that use IR engines for pre-processing purposes [31, 35, 32]. But crafting such systems remains a key re- search challenge. The PageRank Web-search algorithm [1] uses explicitly- indicated inter-document relationships as an additional source of information beyond textual content, computing which documents are the most central . Here, we consider adapting this idea to corpora in which explicit links between docu- ments do not exist. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR’05, August 15–19, 2005, Salvador, Brazil. Copyright 2005 ACM 1-59593-034-5/05/0008 . .. $ 5.00. How should we form links in a non-hypertext setting? While previous work in summarization has applied Page- Rank to cosine-based links [4], we draw on research demon- strating the success of using language models to improve IR performance in general [30, 2] and to model inter-document relationships in particular [16]. SpeciFcally, we employ gen- eration links , which are based on the probability assigned by
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 8

lmpagerank - PageRank without hyperlinks: Structural...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online