{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

PageRankVariants

# PageRankVariants - Topics CS345 Data Mining Link Analysis 2...

This preview shows pages 1–3. Sign up to view the full content.

1 CS345 Data Mining Link Analysis 2 Page Rank Variants Anand Rajaraman, Jeffrey D. Ullman Topics This lecture Many-walkers model Tricks for speeding convergence Topic-Specific Page Rank Random walk interpretation At time 0, pick a page on the web uniformly at random to start the walk Suppose at time t, we are at page j At time t+1 With probability β , pick a page uniformly at random from O(j) and walk to it With probability 1- β , pick a page on the web uniformly at random and teleport into it Page rank of page p = “steady state” probability that at any given time, the random walker is at page p Many random walkers Alternative, equivalent model Imagine a large number M of independent, identical random walkers (M À N) At any point in time, let M(p) be the number of random walkers at page p The page rank of p is the fraction of random walkers that are expected to be at page p i.e., E [M(p)]/M. Speeding up convergence Exploit locality of links Pages tend to link most often to other pages within the same host or domain Partition pages into clusters host, domain, … Compute local page rank for each cluster can be done in parallel Compute page rank on graph of clusters Initial rank of a page is the product of its local rank and the rank of its cluster Use as starting vector for normal page rank computation 2-3x speedup In Pictures 2.0 0.1 Local ranks 2.05 0.05 Intercluster weights Ranks of clusters 1.5 Initial eigenvector 3.0 0.15

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Other tricks Adaptive methods Extrapolation Typically, small speedups ~20-30% Problems with page rank Measures generic popularity of a page Biased against topic-specific authorities Ambiguous queries e.g., jaguar This lecture Uses a single measure of importance Other models e.g., hubs-and-authorities Next lecture Susceptible to Link spam Artificial link topographies created in order to boost page rank Next lecture Topic-Specific Page Rank Instead of generic popularity, can we measure popularity within a topic? E.g., computer science, health Bias the random walk When the random walker teleports, he picks a page from a set S of web pages S contains only pages that are relevant to the topic E.g., Open Directory (DMOZ) pages for a given topic ( www.dmoz.org ) For each teleport set S, we get a different rank
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern