PageRankVariants

PageRankVariants - CS345 Data Mining Link Analysis 2 Page...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS345 Data Mining Link Analysis 2 Page Rank Variants Anand Rajaraman, Jeffrey D. Ullman Topics This lecture Many-walkers model Tricks for speeding convergence Topic-Specific Page Rank Random walk interpretation At time 0, pick a page on the web uniformly at random to start the walk Suppose at time t, we are at page j At time t+1 With probability , pick a page uniformly at random from O(j) and walk to it With probability 1- , pick a page on the web uniformly at random and teleport into it Page rank of page p = steady state probability that at any given time, the random walker is at page p Many random walkers Alternative, equivalent model Imagine a large number M of independent, identical random walkers (M N) At any point in time, let M(p) be the number of random walkers at page p The page rank of p is the fraction of random walkers that are expected to be at page p i.e., E [M(p)]/M. Speeding up convergence Exploit locality of links Pages tend to link most often to other pages within the same host or domain Partition pages into clusters host, domain, Compute local page rank for each cluster can be done in parallel Compute page rank on graph of clusters Initial rank of a page is the product of its local rank and the rank of its cluster Use as starting vector for normal page rank computation 2-3x speedup In Pictures 2.0 0.1 Local ranks 2.05 0.05 Intercluster weights Ranks of clusters 1.5 Initial eigenvector 3.0 0.15 Other tricks Adaptive methods Extrapolation Typically, small speedups ~20-30% Problems with page rank Measures generic popularity of a page Biased against topic-specific authorities Ambiguous queries e.g., jaguar This lecture Uses a single measure of importance Other models e.g., hubs-and-authorities Next lecture Susceptible to Link spam Artificial link topographies created in order to boost page rank Next lecture Topic-Specific Page Rank Instead of generic popularity, can we measure popularity within a topic? E.g., computer science, health Bias the random walk When the random walker teleports, he picks a page from a set S of web pages S contains only pages that are relevant to the topic E.g., Open Directory (DMOZ) pages for a given topic ( www.dmoz.org ) For each teleport set S, we get a different rank vector r S Matrix formulation...
View Full Document

Page1 / 37

PageRankVariants - CS345 Data Mining Link Analysis 2 Page...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online