10-pagerank

What if we could not even fit rnew in memory see

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: fic authorities Solution: Topic-Specific PageRank (next lecture) Uses a single measure of importance Other models e.g., hubs-and-authorities Solution: Hubs-and-Authorities (next) Susceptible to Link spam Artificial link topographies created in order to boost page rank Solution: TrustRank (next lecture) 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 46 Interesting pages fall into two classes: 1. Authorities are pages containing useful information Newspaper home pages Course home pages Home pages of auto manufacturers Hubs are pages that link to authorities 2. List of newspapers Course bulletin List of US auto manufacturers 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets NYT: 10 Ebay: 3 Yahoo: 3 CNN: 8 WSJ: 9 47 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 48 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 49 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 50 A good hub links to many good authorities A good authority is linked from many good hubs Model using two scores for each node: Hub score and Authority score Represented as vectors h and a 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 51 HITS uses adjacency matrix A[i, j] = 1 if page i links to page j, 0 else AT, the transpose of A, is similar to the PageRank matrix M but AT has 1’s where M has fractions Yahoo A= Amazon 2/7/2011 yam y111 a101 m010 M’soft Jure Leskovec, Stanford C246: Mining Massive Datasets 52 Yahoo A= Amazon 2/7/2011 yam y111 a101 m010 M’soft Jure Leskovec, Stanford C246: Mining Massive Datasets 53 Notation: Vector a=(a1…,an), h=(h1…,hn) Adjacency matrix (n x n): Aij=1 if ij else Aij=0 Then: hi = ∑ a j ⇔ hi = ∑ Aij a j i→ j j So: h = A ⋅ a Likewise: a = AT ⋅ h 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 54 The hub score of page i is proportional to the sum of the authority scores of the pages it links to: h = λ A a Constant λ is a scaling factor, λ = 1/∑hi The authority score of page i is proportional to the sum of t...
View Full Document

Ask a homework question - tutors are online