{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

10-pagerank - CS246 Mining Massive Datasets Jure Leskovec...

Info icon This preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Image of page 2
2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 3 Web as a directed graph What is the structure of the Web? How is it organized?
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Two types of directed graphs: DAG – Directed Acyclic Graph: Has no cycles: if u can reach v , then v can not reach u Strongly connected: Any node can reach any node via a directed path Any directed graph can be expressed in terms of these two types of graphs 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 4
Image of page 4
Strongly connected component (SCC) is a set of nodes S : Every pair of nodes in S can reach each other There is no larger set containing S with this property Any directed graph is a DAG on its SCCs: Each SCC is a super-node Super-node A links to super-node B if a node in A links to node in B 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 5
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Take a large snapshot of the Web and try to understand how it’s SCCs “fit” as a DAG Computational issues: Say want to find SCC containing specific node v ? Observation: Out(v) … nodes reachable from v (via out-edges) In(v) … nodes reachable from v (via in-edges) SCC containing v : = Out(v, G) In(v, G) = Out(v, G) Out(v, G) where G is G with directions of edges flipped 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 6 v
Image of page 6
250 million webpages, 1.5 billion links [Altavista] 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 7 [Broder et al., ‘00]
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Out-/In- Degree Distribution: p k : fraction of nodes with k out-/in-links Histogram of p k vs. k 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 8 Normalized count, p k
Image of page 8