This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS345 Data Mining Link Analysis 2 Page Rank Variants Anand Rajaraman, Jeffrey D. Ullman Topics This lecture Manywalkers model Tricks for speeding convergence TopicSpecific Page Rank Random walk interpretation At time 0, pick a page on the web uniformly at random to start the walk Suppose at time t, we are at page j At time t+1 With probability β , pick a page uniformly at random from O(j) and walk to it With probability 1 β , pick a page on the web uniformly at random and teleport into it Page rank of page p = “steady state” probability that at any given time, the random walker is at page p Many random walkers Alternative, equivalent model Imagine a large number M of independent, identical random walkers (M À N) At any point in time, let M(p) be the number of random walkers at page p The page rank of p is the fraction of random walkers that are expected to be at page p i.e., E [M(p)]/M. Speeding up convergence Exploit locality of links Pages tend to link most often to other pages within the same host or domain Partition pages into clusters host, domain, … Compute local page rank for each cluster can be done in parallel Compute page rank on graph of clusters Initial rank of a page is the product of its local rank and the rank of its cluster Use as starting vector for normal page rank computation 23x speedup In Pictures 2.0 0.1 Local ranks 2.05 0.05 Intercluster weights Ranks of clusters 1.5 Initial eigenvector 3.0 0.15 Other tricks Adaptive methods Extrapolation Typically, small speedups ~2030% Problems with page rank Measures generic popularity of a page Biased against topicspecific authorities Ambiguous queries e.g., jaguar This lecture Uses a single measure of importance Other models e.g., hubsandauthorities Next lecture Susceptible to Link spam Artificial link topographies created in order to boost page rank Next lecture TopicSpecific Page Rank Instead of generic popularity, can we measure popularity within a topic? E.g., computer science, health Bias the random walk When the random walker teleports, he picks a page from a set S of web pages S contains only pages that are relevant to the topic E.g., Open Directory (DMOZ) pages for a given topic ( www.dmoz.org ) For each teleport set S, we get a different rank vector r S Matrix formulation...
View
Full Document
 Fall '09
 Linear Algebra, Data Mining, Web page, Ri, PageRank, Page Rank

Click to edit the document details