This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **CS345 Data Mining Link Analysis 2 Page Rank Variants Anand Rajaraman, Jeffrey D. Ullman Topics This lecture Many-walkers model Tricks for speeding convergence Topic-Specific Page Rank Random walk interpretation At time 0, pick a page on the web uniformly at random to start the walk Suppose at time t, we are at page j At time t+1 With probability β , pick a page uniformly at random from O(j) and walk to it With probability 1- β , pick a page on the web uniformly at random and teleport into it Page rank of page p = “steady state” probability that at any given time, the random walker is at page p Many random walkers Alternative, equivalent model Imagine a large number M of independent, identical random walkers (M À N) At any point in time, let M(p) be the number of random walkers at page p The page rank of p is the fraction of random walkers that are expected to be at page p i.e., E [M(p)]/M. Speeding up convergence Exploit locality of links Pages tend to link most often to other pages within the same host or domain Partition pages into clusters host, domain, … Compute local page rank for each cluster can be done in parallel Compute page rank on graph of clusters Initial rank of a page is the product of its local rank and the rank of its cluster Use as starting vector for normal page rank computation 2-3x speedup In Pictures 2.0 0.1 Local ranks 2.05 0.05 Intercluster weights Ranks of clusters 1.5 Initial eigenvector 3.0 0.15 Other tricks Adaptive methods Extrapolation Typically, small speedups ~20-30% Problems with page rank Measures generic popularity of a page Biased against topic-specific authorities Ambiguous queries e.g., jaguar This lecture Uses a single measure of importance Other models e.g., hubs-and-authorities Next lecture Susceptible to Link spam Artificial link topographies created in order to boost page rank Next lecture Topic-Specific Page Rank Instead of generic popularity, can we measure popularity within a topic? E.g., computer science, health Bias the random walk When the random walker teleports, he picks a page from a set S of web pages S contains only pages that are relevant to the topic E.g., Open Directory (DMOZ) pages for a given topic ( www.dmoz.org ) For each teleport set S, we get a different rank vector r S Matrix formulation...

View
Full
Document