PageRank without hyperlinks: Structural re-ranking using
links induced by language models
Computer Science Department, Cornell University, Ithaca NY 14853, U.S.A.
Language Technologies Institute, Carnegie Mellon University, Pittsburgh PA 15213, U.S.A.
Computer Science Department, Carnegie Mellon University, Pittsburgh PA 15213, U.S.A.
Inspired by the PageRank and HITS (hubs and authorities)
algorithms for Web search, we propose a
approach to ad hoc information retrieval: we reorder the
documents in an initially retrieved set by exploiting asym-
metric relationships between them. SpeciFcally, we consider
, which indicate that the language model in-
duced from one document assigns high probability to the
text of another; in doing so, we take care to prevent bias
against long documents. We study a number of re-ranking
criteria based on measures of
in the graphs formed
by generation links, and show that integrating centrality into
standard language-model-based retrieval is quite e±ective at
improving precision at top ranks.
Categories and Subject Descriptors:
tion Search and Retrieval]: Retrieval models
language modeling, PageRank, HITS, hubs,
authorities, social networks, high-accuracy retrieval, graph-
based retrieval, structural re-ranking
Information retrieval systems capable of achieving high
precision at the top ranks of the returned results would be
of obvious beneFt to human users, and could also aid pseudo-
feedback approaches, question-answering systems, and other
applications that use IR engines for pre-processing purposes
[31, 35, 32]. But crafting such systems remains a key re-
The PageRank Web-search algorithm  uses explicitly-
indicated inter-document relationships as an additional source
of information beyond textual content, computing which
documents are the most
. Here, we consider adapting
this idea to corpora in which explicit links between docu-
ments do not exist.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
August 15–19, 2005, Salvador, Brazil.
Copyright 2005 ACM 1-59593-034-5/05/0008 .
How should we form links in a non-hypertext setting?
While previous work in summarization has applied Page-
Rank to cosine-based links , we draw on research demon-
strating the success of using
to improve IR
performance in general [30, 2] and to model inter-document
relationships in particular . SpeciFcally, we employ
, which are based on the probability assigned by