152011 jure leskovec stanford c246 mining massive

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Finding communities in large graphs (e.g., web) Baskets = nodes; items = outgoing neighbors Searching for complete bipartite subgraphs Ks,t of a big graph … s nodes … t nodes A dense 2-layer graph Use this to define topics: What the same people on the left talk about on the right 1/5/2011 How? View each node i as a bucket Bi of nodes i it points to Ks,t = a set Y of size t that occurs in s buckets Bi Looking for Ks,t set of support s and look at layer t – all frequent sets of size t Jure Leskovec, Stanford C246: Mining Massive Datasets 11 Define: Frequent Itemsets Association rules: Confidence, Support, Interestingness 2 algorithms for finding frequent itemsets: A-priori algorithm PCY algorithm +2 refinements: Multistage Algorithm Multihash Algorithm Random sampling and SON algorithms 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 12 Association Rules: If-then rules about the contents of baskets {i1, i2,…,ik} → j means: “if a basket contains all of i1,…,ik then it is likely to contain j” Confidence of this association rule is the probability of j given I = {i1,…,ik} 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 13 Not all high-confidence rules are interesting The rule X → milk may have high confidence for many itemsets X, because milk is just purchased very often (independent of X) Interest of an association rule I → j: difference between its confidence and the fraction of baskets that contain j Interesting rules are t...
View Full Document

This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.

Ask a homework question - tutors are online