PageRankVariants - Topics CS345 Data Mining Link Analysis 2...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CS345 Data Mining Link Analysis 2 Page Rank Variants Anand Rajaraman, Jeffrey D. Ullman Topics ± This lecture ² Many-walkers model ² Tricks for speeding convergence ² Topic-Specific Page Rank Random walk interpretation ± At time 0, pick a page on the web uniformly at random to start the walk ± Suppose at time t, we are at page j ± At time t+1 ² With probability β , pick a page uniformly at random from O(j) and walk to it ² With probability 1- β , pick a page on the web uniformly at random and teleport into it ± Page rank of page p = “steady state” probability that at any given time, the random walker is at page p Many random walkers ± Alternative, equivalent model ± Imagine a large number M of independent, identical random walkers (M À N) ± At any point in time, let M(p) be the number of random walkers at page p ± The page rank of p is the fraction of random walkers that are expected to be at page p i.e., E [M(p)]/M. Speeding up convergence ± Exploit locality of links ² Pages tend to link most often to other pages within the same host or domain ± Partition pages into clusters ² host, domain, … ± Compute local page rank for each cluster ² can be done in parallel ± Compute page rank on graph of clusters ± Initial rank of a page is the product of its local rank and the rank of its cluster ² Use as starting vector for normal page rank computation ² 2-3x speedup In Pictures 2.0 0.1 Local ranks 2.05 0.05 Intercluster weights Ranks of clusters 1.5 Initial eigenvector 3.0 0.15
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Other tricks ± Adaptive methods ± Extrapolation ± Typically, small speedups ² ~20-30% Problems with page rank ± Measures generic popularity of a page ² Biased against topic-specific authorities ² Ambiguous queries e.g., jaguar ² This lecture ± Uses a single measure of importance ² Other models e.g., hubs-and-authorities ² Next lecture ± Susceptible to Link spam ² Artificial link topographies created in order to boost page rank ² Next lecture Topic-Specific Page Rank ± Instead of generic popularity, can we measure popularity within a topic? ² E.g., computer science, health ± Bias the random walk ² When the random walker teleports, he picks a page from a set S of web pages ² S contains only pages that are relevant to the topic ² E.g., Open Directory (DMOZ) pages for a given topic ( ) ±
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 03/04/2012.

Page1 / 7

PageRankVariants - Topics CS345 Data Mining Link Analysis 2...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online