TopicSpecificPageRank

TopicSpecificPageRank - CS345 Data Mining Page Rank...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
CS345 Data Mining Page Rank Variants
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Review Page Rank b Web graph encoded by matrix M s N £ N matrix (N = number of web pages) s M ij = 1/|O(j)| iff there is a link from j to i s M ij = 0 otherwise s O(j) = set of pages node i links to b Define matrix A as follows s A ij = β M ij + (1- β )/N, where 0< β <1 s 1- β is the “tax” discussed in prior lecture b Page rank r is first eigenvector of A s Ar = r
Background image of page 2
Random walk interpretation b At time 0, pick a page on the web uniformly at random to start the walk b Suppose at time t, we are at page j b At time t+1 s With probability β , pick a page uniformly at random from O(j) and walk to it s With probability 1- β , pick a page on the web uniformly at random and teleport into it b Page rank of page p = “steady state” probability that at any given time, the random walker is at page p
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Many random walkers b Alternative, equivalent model b Imagine a large number M of independent, identical random walkers (M À N) b At any point in time, let M(p) be the number of random walkers at page p b The page rank of p is the fraction of random walkers that are expected to be at page p i.e., E [M(p)]/M.
Background image of page 4
Problems with page rank b Measures generic popularity of a page s Biased against topic-specific authorities s Ambiguous queries e.g., jaguar s This lecture b Link spam s Creating artificial link topographies in order to boost page rank s Next lecture
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Topic-Specific Page Rank b Instead of generic popularity, can we measure popularity within a topic? s E.g., computer science, health b Bias the random walk s When the random walker teleports, he picks a page from a set S of web pages s S contains only pages that are relevant to the topic s E.g., Open Directory (DMOZ) pages for a given topic ( www.dmoz.org ) b Correspong to each teleport set S, we get a different rank vector r S
Background image of page 6
Matrix formulation b A ij = β M ij + (1- β )/|S| if i 2 S b A ij = β M ij otherwise b Show that A is stochastic b We have weighted all pages in the teleport set S equally s Could also assign different weights to them
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Example 1 2 3 4 Suppose S = {1}, β = 0.8 Node Iteration
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 32

TopicSpecificPageRank - CS345 Data Mining Page Rank...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online