10-pagerank

# 10-pagerank - CS246 Mining Massive Datasets Jure Leskovec...

This preview shows pages 1–15. Sign up to view the full content.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 3 Web as a directed graph What is the structure of the Web? How is it organized?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Two types of directed graphs: DAG – Directed Acyclic Graph: Has no cycles: if u can reach v , then v can not reach u Strongly connected: Any node can reach any node via a directed path Any directed graph can be expressed in terms of these two types of graphs 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 4
Strongly connected component (SCC) is a set of nodes S : Every pair of nodes in S can reach each other There is no larger set containing S with this property Any directed graph is a DAG on its SCCs: Each SCC is a super-node Super-node A links to super-node B if a node in A links to node in B 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 5

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Take a large snapshot of the Web and try to understand how it’s SCCs “fit” as a DAG Computational issues: Say want to find SCC containing specific node v ? Observation: Out(v) … nodes reachable from v (via out-edges) In(v) … nodes reachable from v (via in-edges) SCC containing v : = Out(v, G) In(v, G) = Out(v, G) Out(v, G) where G is G with directions of edges flipped 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 6 v
250 million webpages, 1.5 billion links [Altavista] 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 7 [Broder et al., ‘00]

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Out-/In- Degree Distribution: p k : fraction of nodes with k out-/in-links Histogram of p k vs. k 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 8 Normalized count, p k