10-pagerank

272011 jure leskovec stanford c246 mining massive

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: |r| = 1 r = βM∙r + [(1-β)/N]N where [x]N is an N-vector with all entries x 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 37 We can rearrange the PageRank equation r = β M∙r + [(1-β)/N]N [(1-β)/N]N is an N-vector with all entries (1-β)/N M is a sparse matrix! 10 links per node, approx 10N entries So in each iteration, we need to: Compute rnew = β M∙rold Add a constant value (1-β)/N to each entry in rnew 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 38 Encode sparse matrix using only nonzero entries Space proportional roughly to number of links say 10N, or 4*10*1 billion = 40GB still won’t fit in memory, but will fit on disk source node degree destination nodes 0 1, 5, 7 1 5 17, 64, 113, 117, 245 2 2/7/2011 3 2 13, 23 Jure Leskovec, Stanford C246: Mining Massive Datasets 39 Assume enough RAM to fit rnew into memory Store rold and matrix M on disk Initialize all entries of rnew to (1-β)/N For each page p (of out-degree n): Read into memory: p, n, dest1,…,destn, rold(p) for j = 1…n: rnew(destj) += β rold(p) / n rnew 0 1 2 3 4 5 6 2/7/2011 src 0 1 2 degree 3 4 2 destination 1, 5, 6 17, 64, 113, 117 13, 23 Jure Leskovec, Stanford C246: Mining Massive Datasets rold 0 1 2 3 4 5 6 40 In each iteration, we have to: Read rold and M Write rnew back to disk IO Cost = 2|r| + |M| Questions: What if we had enough memory to fit both rnew and rold? What if we could not even fit rnew in memory? See reading: http://i.stanford.edu/~ullman/mmds/ch5.pdf 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 41 Measures generic popularity of a page Biased against topic-speci...
View Full Document

This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.

Ask a homework question - tutors are online