{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

assoc-rules2-3

# assoc-rules2-3 - Improvements to APriori ParkChenYu...

This preview shows pages 1–10. Sign up to view the full content.

1 Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 PCY Algorithm Hash-based improvement to A-Priori. During Pass 1 of A-priori, most memory is  idle. Use that memory to keep counts of buckets  into which pairs of items are hashed. Just the count, not the pairs themselves. Gives extra condition that candidate pairs  must satisfy on Pass 2.
3 Picture of PCY Hash table Item counts Bitmap Pass 1 Pass 2 Frequent items Counts of candidate    pairs

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 PCY Algorithm – Before Pass 1  Organize Main Memory Space to count each item. One (typically) 4-byte integer per item. Use the rest of the space for as many  integers, representing buckets, as we  can.
5 PCY Algorithm – Pass 1 FOR (each basket) { FOR (each item) add 1 to item’s count; FOR (each pair of items) { hash the pair to a bucket; add 1 to the count for that bucket } }

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 Observations About Buckets 1. If a bucket contains a frequent pair,  then the bucket is surely frequent. We cannot use the hash table to  eliminate any member of this bucket. 1. Even without any frequent pair, a  bucket can be frequent. Again, nothing in the bucket can be  eliminated.
7 Observations – (2) 3.  But in the best case, the count for a  bucket is less than the support  s . Now, all pairs that hash to this bucket can  be eliminated as candidates, even if the  pair consists of two frequent items.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
8 PCY Algorithm – Between  Passes Replace the buckets by a bit-vector: 1 means the bucket count exceeds the support  s   (a  frequent bucket  ); 0 means it did not. 4-byte integers are replaced by bits, so the  bit-vector requires 1/32 of memory. Also, decide which items are frequent and  list them for the second pass.
9 PCY Algorithm – Pass 2 Count all pairs { i } that meet the  conditions: 1. Both  i   and  j   are frequent items. 2. The pair { i j  }, hashes to a bucket number  whose bit in the bit vector is 1.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}