This preview shows page 1. Sign up to view the full content.
Unformatted text preview: |r| = 1
r = βM∙r + [(1-β)/N]N
where [x]N is an N-vector with all entries x
2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 37 We can rearrange the PageRank equation r = β M∙r + [(1-β)/N]N [(1-β)/N]N is an N-vector with all entries (1-β)/N M is a sparse matrix! 10 links per node, approx 10N entries So in each iteration, we need to: Compute rnew = β M∙rold Add a constant value (1-β)/N to each entry in rnew 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 38 Encode sparse matrix using only nonzero
entries Space proportional roughly to number of links say 10N, or 4*10*1 billion = 40GB still won’t fit in memory, but will fit on disk
node degree destination nodes 0 1, 5, 7 1 5 17, 64, 113, 117, 245 2
2 13, 23 Jure Leskovec, Stanford C246: Mining Massive Datasets 39 Assume enough RAM to fit rnew into memory Store rold and matrix M on disk
Initialize all entries of rnew to (1-β)/N
For each page p (of out-degree n):
Read into memory: p, n, dest1,…,destn, rold(p)
for j = 1…n:
rnew(destj) += β rold(p) / n
2/7/2011 src 0
2 degree 3
2 destination 1, 5, 6
17, 64, 113, 117
13, 23 Jure Leskovec, Stanford C246: Mining Massive Datasets rold
40 In each iteration, we have to: Read rold and M Write rnew back to disk IO Cost = 2|r| + |M| Questions: What if we had enough memory to fit both rnew
and rold? What if we could not even fit rnew in memory? See reading: http://i.stanford.edu/~ullman/mmds/ch5.pdf 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 41 Measures generic popularity of a page Biased against topic-speci...
View Full Document
This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.
- Winter '09