TopicSpecificPageRank

# TopicSpecificPageRank - CS345 Data Mining Page Rank...

This preview shows pages 1–8. Sign up to view the full content.

CS345 Data Mining Page Rank Variants

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Review Page Rank box3 Web graph encoded by matrix M square6 N £ N matrix (N = number of web pages) square6 M ij = 1/|O(j)| iff there is a link from j to i square6 M ij = 0 otherwise square6 O(j) = set of pages node i links to box3 Define matrix A as follows square6 A ij = β M ij + (1- β )/N, where 0< β <1 square6 1- β is the “tax” discussed in prior lecture box3 Page rank r is first eigenvector of A square6 Ar = r
Random walk interpretation box3 At time 0, pick a page on the web uniformly at random to start the walk box3 Suppose at time t, we are at page j box3 At time t+1 square6 With probability β , pick a page uniformly at random from O(j) and walk to it square6 With probability 1- β , pick a page on the web uniformly at random and teleport into it box3 Page rank of page p = “steady state” probability that at any given time, the random walker is at page p

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Many random walkers box3 Alternative, equivalent model box3 Imagine a large number M of independent, identical random walkers (M À N) box3 At any point in time, let M(p) be the number of random walkers at page p box3 The page rank of p is the fraction of random walkers that are expected to be at page p i.e., E [M(p)]/M.
Problems with page rank box3 Measures generic popularity of a page square6 Biased against topic-specific authorities square6 Ambiguous queries e.g., jaguar square6 This lecture box3 Link spam square6 Creating artificial link topographies in order to boost page rank square6 Next lecture

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Topic-Specific Page Rank box3 Instead of generic popularity, can we measure popularity within a topic? square6 E.g., computer science, health box3 Bias the random walk square6 When the random walker teleports, he picks a page from a set S of web pages square6 S contains only pages that are relevant to the topic square6 E.g., Open Directory (DMOZ) pages for a given topic ( www.dmoz.org ) box3 Correspong to each teleport set S, we get a different rank vector r S
Matrix formulation box3 A ij = β M ij + (1- β )/|S| if i 2 S box3 A ij = β M ij otherwise box3 Show that A is stochastic box3 We have weighted all pages in the teleport set S equally square6 Could also assign different weights to them

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern