02-assoc

G s125 helps catch more truly frequent itemsets but

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: andidate if it is found to be frequent in any one or more subsets of the baskets. 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 50 On a second pass, count all the candidate itemsets and determine which are frequent in the entire set Key “monotonicity” idea: An itemset cannot be frequent in the entire set of baskets unless it is frequent in at least one subset. 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 51 SON lends itself to distributed data mining Baskets distributed among many nodes Compute frequent itemsets at each node Distribute candidates to all nodes Accumulate the counts of all candidates. 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 52 Phase 1: Find candidate itemsets Map? Reduce? Phase 2: Find true frequent itemsets Map? Reduce? 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 53 1. Maximal Frequent itemsets: no immediate superset is frequent 2. Closed itemsets: no immediate superset has the same count (> 0). 1/5/2011 Stores not only frequent information, but exact counts Jure Leskovec, Stanford C246: Mining Massive Datasets 54 Count Maximal (s=3) A4 No B5 No C3 No AB 4 Yes AC 2 No BC 3 Yes ABC 2 No 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets Closed No Yes No Yes No Yes Yes Frequent, but superset BC also frequent. Frequent, and its only superset, ABC, not freq. Superset BC has same count. Its only superset, ABC, has smaller count. 55...
View Full Document

This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.

Ask a homework question - tutors are online