10-pagerank

272011 jure leskovec stanford c246 mining massive

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: vote is proportional to the importance of its source page If page P with importance x has n out-links, each link gets x/n votes Page P’s own importance is the sum of the votes on its in-links 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 15 The web in 1839 y a/2 Yahoo y/2 y = y /2 + a /2 a = y /2 + m m = a /2 y/2 m Amazon a 2/7/2011 M’soft a/2 m Jure Leskovec, Stanford C246: Mining Massive Datasets 16 3 equations, 3 unknowns, no constants No unique solution All solutions equivalent modulo scale factor Additional constraint forces uniqueness y+a+m = 1 y = 2/5, a = 2/5, m = 1/5 Gaussian elimination method works for small examples, but we need a better method for large web-size graphs 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 17 Matrix M has one row and one column for each web page Suppose page j has n out-links If j → i, then Mij = 1/n else Mij = 0 M is a column stochastic matrix Columns sum to 1 Suppose r is a vector with one entry per web page: ri is the importance score of page i Call it the rank vector |r| = 1 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 18 Suppose page j links to 3 pages, including i j i i = 1/3 M 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets r r 19 The flow equations can be written r = M∙r So the rank vector is an eigenvector of the stochastic web matrix In fact, its first or principal eigenvector, with corresponding eigenvalue 1 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 20 Y! MS Y! ½ ½ 0 A ½ 0 1 MS Yahoo A 0 ½ 0 r = Mr Amazon M’soft y = y /2 + a /2 a = y /2 + m m = a /2 2/7/2011 Jure Leskovec, Stanford C246: Minin...
View Full Document

Ask a homework question - tutors are online