This preview shows page 1. Sign up to view the full content.
Unformatted text preview: g Massive Datasets y
a =½ 0 1
m 21 Simple iterative scheme
Suppose there are N web pages
Initialize: r0 = [1/N,….,1/N]T
Iterate: rk+1 = M∙rk
Stop when |rk+1 - rk|1 < ε |x|1 = ∑1≤i≤N|xi| is the L1 norm Can use any other vector norm e.g., Euclidean 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 22 Power iteration: A MS MS Y! Y! A ½ ½ 0 A Set ri=1/n ri=∑j Mij∙rj And iterate Y!
½ 0 1 MS 0 ½ 0 Example:
m 2/7/2011 1/3
11/24 . . .
1/6 Jure Leskovec, Stanford C246: Mining Massive Datasets 2/5
23 Imagine a random web surfer At any time t, surfer is on some page P At time t+1, the surfer follows an outlink from P
uniformly at random Ends up on some page Q linked from P Process repeats indefinitely Let p(t) be a vector whose ith component is
the probability that the surfer is at page i at
time t p(t) is a probability distribution on pages 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 24 Where is the surfer at time t+1? Follows a link uniformly at random p(t+1) = M∙p(t) Suppose the random walk reaches a state
such that p(t+1) = M∙p(t) = p(t) Then p(t) is called a stationary distribution
for the random walk Our rank vector r satisfies r = M∙r So it is a stationary distribution for the random
2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 25 A central result from the theory of random walks
(aka Markov processes): For graphs that satisfy certain conditions, the
stationary distribution is unique and eventually
will be reached no matter what the initial
probability distribution at time t = 0. 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 26 Some pages are “dead ends”
(have no out-links) Such pages cause importance
to leak out Spider traps (all out links a...
View Full Document
- Winter '09