{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

10-pagerank

10-pagerank - CS246 Mining Massive Datasets Jure Leskovec...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Background image of page 2
2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 3 Web as a directed graph What is the structure of the Web? How is it organized?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Two types of directed graphs: DAG – Directed Acyclic Graph: Has no cycles: if u can reach v , then v can not reach u Strongly connected: Any node can reach any node via a directed path Any directed graph can be expressed in terms of these two types of graphs 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 4
Background image of page 4
Strongly connected component (SCC) is a set of nodes S : Every pair of nodes in S can reach each other There is no larger set containing S with this property Any directed graph is a DAG on its SCCs: Each SCC is a super-node Super-node A links to super-node B if a node in A links to node in B 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Take a large snapshot of the Web and try to understand how it’s SCCs “fit” as a DAG Computational issues: Say want to find SCC containing specific node v ? Observation: Out(v) … nodes reachable from v (via out-edges) In(v) … nodes reachable from v (via in-edges) SCC containing v : = Out(v, G) In(v, G) = Out(v, G) Out(v, G) where G is G with directions of edges flipped 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 6 v
Background image of page 6
250 million webpages, 1.5 billion links [Altavista] 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 7 [Broder et al., ‘00]
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Out-/In- Degree Distribution: p k : fraction of nodes with k out-/in-links Histogram of p k vs. k 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 8 Normalized count, p k
Background image of page 8
Plot the same data on log-log axes: 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 9 k p k log log log α β− = β = k p k Normalized count, p k
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 10 [Broder et al., ‘00]
Background image of page 10
Random network Power-law network Degree distribution is Binomial, i.e., all nodes have similar degree Degrees are Power-law, i.e., heavily skewed Jure Leskovec, Stanford C246: Mining Massive Datasets 11 2/7/2011
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Web pages are not equally “important” www.joe-schmoe.com vs. www.stanford.edu Since there is large diversity in the connectivity of the webgraph we can rank the pages by the link structure 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 12
Background image of page 12
We will cover the following Link Analysis approaches to computing importances of nodes in a graph: Page Rank Hubs and Authorities (HITS) Topic-Specific (Personalized) Page Rank Spam Detection Algorithms 2/7/2011 13 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
First try: Page is more important if it has more links In-coming links? Out-going links?
Background image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 56

10-pagerank - CS246 Mining Massive Datasets Jure Leskovec...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document Right Arrow Icon bookmark
Ask a homework question - tutors are online