PageRank-3

PageRank-3 - CS345 Data Mining Link Analysis Algorithms...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman Link Analysis Algorithms b Page Rank b Hubs and Authorities b Topic-Specific Page Rank b Spam Detection Algorithms b Other interesting topics we won’t cover s Detecting duplicates and mirrors s Mining for communities s Classification s Spectral clustering Ranking web pages b Web pages are not equally “important” s www.joe-schmoe.com v www.stanford.edu b Inlinks as votes s www.stanford.edu has 23,400 inlinks s www.joe-schmoe.com has 1 inlink b Are all inlinks equal? s Recursive question! Simple recursive formulation b Each link’s vote is proportional to the importance of its source page b If page P with importance x has n outlinks, each link gets x/n votes b Page P ’s own importance is the sum of the votes on its inlinks Simple “flow” model The web in 1839 Yahoo M’soft Amazon y a m y/2 y/2 a/2 a/2 m y = y /2 + a /2 a = y /2 + m m = a /2 Solving the flow equations b 3 equations, 3 unknowns, no constants s No unique solution s All solutions equivalent modulo scale factor b Additional constraint forces uniqueness s y+a+m = 1 s y = 2/5, a = 2/5, m = 1/5 b Gaussian elimination method works for small examples, but we need a better method for large graphs Matrix formulation b Matrix M has one row and one column for each web page b Suppose page j has n outlinks s If j ! i, then M ij =1/n s Else M ij =0 b M is a column stochastic matrix s Columns sum to 1 b Suppose r is a vector with one entry per web page s r i is the importance score of page i s Call it the rank vector s | r | = 1 Example Suppose page j links to 3 pages, including i i j M r r = i 1/3 Eigenvector formulation b The flow equations can be written r = Mr b So the rank vector is an eigenvector of the stochastic web matrix s In fact, its first or principal eigenvector, with corresponding eigenvalue 1 Example Yahoo M’soft Amazon y 1/2 1/2 a 1/2 0 1 m 0 1/2 y a m y = y /2 + a /2 a = y /2 + m m = a /2 r = Mr y 1/2 1/2 0 y a = 1/2 0 1 a m 0 1/2 0 m Power Iteration method b Simple iterative scheme (aka relaxation ) b Suppose there are N web pages b Initialize: r = [1/N,….,1/N] T b Iterate: r k+1 = Mr k b Stop when | r k+1 - r k | 1 < ε s | x | 1 = ∑ 1 ≤ i ≤ N |x i | is the L 1 norm s Can use any other vector norm e.g., Euclidean Power Iteration Example...
View Full Document

{[ snackBarMessage ]}

Page1 / 37

PageRank-3 - CS345 Data Mining Link Analysis Algorithms...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online