TopicSpecificPageRank

TopicSpecificPageRank - CS345 Data Mining Page Rank...

This preview shows pages 1–9. Sign up to view the full content.

CS345 Data Mining Page Rank Variants

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Review Page Rank b Web graph encoded by matrix M s N £ N matrix (N = number of web pages) s M ij = 1/|O(j)| iff there is a link from j to i s M ij = 0 otherwise s O(j) = set of pages node i links to b Define matrix A as follows s A ij = β M ij + (1- β )/N, where 0< β <1 s 1- β is the “tax” discussed in prior lecture b Page rank r is first eigenvector of A s Ar = r
Random walk interpretation b At time 0, pick a page on the web uniformly at random to start the walk b Suppose at time t, we are at page j b At time t+1 s With probability β , pick a page uniformly at random from O(j) and walk to it s With probability 1- β , pick a page on the web uniformly at random and teleport into it b Page rank of page p = “steady state” probability that at any given time, the random walker is at page p

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Many random walkers b Alternative, equivalent model b Imagine a large number M of independent, identical random walkers (M À N) b At any point in time, let M(p) be the number of random walkers at page p b The page rank of p is the fraction of random walkers that are expected to be at page p i.e., E [M(p)]/M.
Problems with page rank b Measures generic popularity of a page s Biased against topic-specific authorities s Ambiguous queries e.g., jaguar s This lecture b Link spam s Creating artificial link topographies in order to boost page rank s Next lecture

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Topic-Specific Page Rank b Instead of generic popularity, can we measure popularity within a topic? s E.g., computer science, health b Bias the random walk s When the random walker teleports, he picks a page from a set S of web pages s S contains only pages that are relevant to the topic s E.g., Open Directory (DMOZ) pages for a given topic ( www.dmoz.org ) b Correspong to each teleport set S, we get a different rank vector r S
Matrix formulation b A ij = β M ij + (1- β )/|S| if i 2 S b A ij = β M ij otherwise b Show that A is stochastic b We have weighted all pages in the teleport set S equally s Could also assign different weights to them

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Example 1 2 3 4 Suppose S = {1}, β = 0.8 Node Iteration
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/31/2011 for the course CS 345 taught by Professor Dunbar,a during the Fall '07 term at UC Davis.

Page1 / 32

TopicSpecificPageRank - CS345 Data Mining Page Rank...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online