02-assoc

Reasonable support eg 1 k 2 requires the most memory

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: l item counts Can we use the idle memory to reduce memory required in pass 2? Pass 1 of PCY: In addition to item counts, maintain a hash table with as many buckets as fit in memory Keep a count for each bucket into which pairs of items are hashed Just the count, not the pairs that hash to the bucket! 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 34 FOR (each basket) { FOR (each item in the basket) add 1 to item’s count; FOR (each pair of items) { hash the pair to a bucket; add 1 to the count for that bucket } } 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 35 If a bucket contains a frequent pair, then the bucket is surely frequent We cannot use the hash table to eliminate any member of this bucket Even without any frequent pair, a bucket can still be frequent But, for a bucket with total count less than s, none of its pairs can be frequent Now, all pairs that hash to this bucket can be eliminated as candidates, even if the pair consists of two frequent items 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 36 Replace the buckets by a bit-vector: 1 means the bucket count exceeded the support s (a frequent bucket ); 0 means it did not 4-byte integers are replaced by bits, so the bit-vector requires 1/32 of memory Also, decide which items are frequent and list them for the second pass 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 37 Count all pairs {i, j} that meet the conditions for being a candidate...
View Full Document

Ask a homework question - tutors are online