{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

PageRank-4

# PageRank-4 - CS345 Data Mining Link Analysis Algorithms...

This preview shows pages 1–12. Sign up to view the full content.

CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Link Analysis Algorithms box3 Page Rank box3 Hubs and Authorities box3 Topic-Specific Page Rank box3 Spam Detection Algorithms box3 Other interesting topics we won’t cover square6 Detecting duplicates and mirrors square6 Mining for communities
Ranking web pages box3 Web pages are not equally “important” square6 www.joe-schmoe.com v www.stanford.edu box3 Inlinks as votes square6 www.stanford.edu has 23,400 inlinks square6 www.joe-schmoe.com has 1 inlink box3 Are all inlinks equal? square6 Recursive question!

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Simple recursive formulation box3 Each link’s vote is proportional to the importance of its source page box3 If page P with importance x has n outlinks, each link gets x/n votes box3 Page P ’s own importance is the sum of the votes on its inlinks
Simple “flow” model The web in 1839 Yahoo M’soft Amazon y a m y/2 y/2 a/2 a/2 m y = y /2 + a /2 a = y /2 + m m = a /2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Solving the flow equations box3 3 equations, 3 unknowns, no constants square6 No unique solution square6 All solutions equivalent modulo scale factor box3 Additional constraint forces uniqueness square6 y+a+m = 1 square6 y = 2/5, a = 2/5, m = 1/5 box3 Gaussian elimination method works for small examples, but we need a better method for large graphs
Matrix formulation box3 Matrix M has one row and one column for each web page box3 Suppose page j has n outlinks square6 If j ! i, then M ij =1/n square6 Else M ij =0 box3 M is a column stochastic matrix square6 Columns sum to 1 box3 Suppose r is a vector with one entry per web page square6 r i is the importance score of page i square6 Call it the rank vector square6 | r | = 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Example Suppose page j links to 3 pages, including i i j M r r = i 1/3
Eigenvector formulation box3 The flow equations can be written r = Mr box3 So the rank vector is an eigenvector of the stochastic web matrix square6 In fact, its first or principal eigenvector, with corresponding eigenvalue 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Example Yahoo M’soft Amazon y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0 y a m y = y /2 + a /2 a = y /2 + m m = a /2 r = Mr y 1/2 1/2 0 y a = 1/2 0 1 a m 0 1/2 0 m
Power Iteration method box3 Simple iterative scheme (aka relaxation ) box3 Suppose there are N web pages box3 Initialize: r 0 = [1/N,….,1/N] T box3 Iterate: r k+1 = Mr k box3 Stop when | r k+1 - r k | 1 < ε square6 | x | 1 = 1 i N |x i | is the L 1 norm square6 Can use any other vector norm e.g., Euclidean

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern