This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Finding communities in large graphs (e.g., web)
Baskets = nodes; items = outgoing neighbors Searching for complete bipartite subgraphs Ks,t of a big graph … s nodes
… t nodes A dense 2layer graph
Use this to define topics:
What the same people on the
left talk about on the right
1/5/2011 How? View each node i as a
bucket Bi of nodes i it points to Ks,t = a set Y of size t that occurs
in s buckets Bi Looking for Ks,t set of support
s and look at layer t – all
frequent sets of size t Jure Leskovec, Stanford C246: Mining Massive Datasets 11 Define: Frequent Itemsets Association rules: Confidence, Support, Interestingness 2 algorithms for finding frequent itemsets: Apriori algorithm PCY algorithm +2 refinements: Multistage Algorithm Multihash Algorithm Random sampling and SON algorithms
1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 12 Association Rules:
Ifthen rules about the contents of baskets {i1, i2,…,ik} → j means: “if a basket contains
all of i1,…,ik then it is likely to contain j” Confidence of this association rule is the
probability of j given I = {i1,…,ik} 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 13 Not all highconfidence rules are interesting The rule X → milk may have high confidence for
many itemsets X, because milk is just purchased
very often (independent of X) Interest of an association rule I → j:
difference between its confidence and the
fraction of baskets that contain j Interesting rules are t...
View
Full
Document
This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.
 Winter '09

Click to edit the document details