Unformatted text preview: 1 HashBased Improvements to APriori ParkChenYu Algorithm Multistage Algorithm Approximate Algorithms 2 PCY Algorithm ◆ Hashbased improvement to APriori. ◆ During Pass 1 of Apriori, most memory is idle. ◆ Use that memory to keep counts of buckets into which pairs of items are hashed. ◗ Just the count, not the pairs themselves. ◆ Gives extra condition that candidate pairs must satisfy on Pass 2. 3 Picture of PCY Hash table Item counts Bitmap Pass 1 Pass 2 Frequent items Counts of candidate pairs 4 PCY Algorithm  Before Pass 1 ◆ Organize main memory: ◗ Space to count each item. • One (typically) 4byte integer per item. ◗ Use the rest of the space for as many integers, representing buckets, as we can. 5 PCY Algorithm  Pass 1 FOR (each basket) { FOR (each item) add 1 to item’s count; FOR (each pair of items) { hash the pair to a bucket; add 1 to the count for that bucket } } 6 PCY Algorithm  Between Passes ◆ Replace the buckets by a bitvector: ◗ 1 means the bucket count ≥ the support s ( frequent bucket ); 0 means it did not. ◆ Integers are replaced by bits, so the bit vector requires little secondpass space. ◆ Also, decide which items are frequent and list them for the second pass. 7 PCY Algorithm  Pass 2 ◆ Count all pairs { i , j } that meet the conditions: 1. Both i and j are frequent items. 2. The pair { i , j }, hashes to a bucket number whose bit in the bit vector is 1....
 Spring '09
