10-pagerank

H 272011 jure leskovec stanford c246 mining massive

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: he hub scores of the pages it is linked from: a = μ AT h Constant μ is scaling factor, μ = 1/∑ai 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 55 The HITS algorithm: Initialize h, a to all 1’s Repeat: h=Aa Scale h so that its sums to 1.0 a = AT h Scale a so that its sums to 1.0 Until h, a converge (i.e., change very little) 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 56 111 A= 101 010 110 AT = 1 0 1 110 Yahoo Amazon M’soft a(yahoo) = a(amazon) = a(m’soft) = 1 1 1 ... 1 0.75 . . . ... 1 1 0.732 1 h(yahoo) = h(amazon) = h(m’soft) = 2/7/2011 1 1 1 1 1 1 ... 1 1 1 2/3 0.71 0.73 . . . 1/3 0.29 0.27 . . . 1.000 0.732 0.268 1 4/5 1 Jure Leskovec, Stanford C246: Mining Massive Datasets 57 Algorithm: Set: a = h = 1n Repeat: h=M a, a=MT h Normalize Then: a=MT (M a) new h new a Thus: a=(MT M) a h=(M MT) h 2/7/2011 a is being updated (in 2 steps): MT (M a) = (MTM) a h is updated (in 2 steps): M (MT h) = (M MT) h Repeated matrix powering Jure Leskovec, Stanford C246: Mining Massive Datasets 58 Under reasonable assumptions about A, the HITS iterative algorithm converges to vectors h* and a*: h* is the principal eigenvector of matrix AAT a* is the principal eigenvector of matrix ATA 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 59 PageRank and HITS are two solutions to the same problem: What is the value of an in-link from u to v? In the PageRank model, the value of the link depends on the links into u In the HITS model, it depends on the value of the other links out of u The destinies of PageRank and HITS post-1998 were very different 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 60...
View Full Document

This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.

Ask a homework question - tutors are online