*This preview shows
pages
1–10. Sign up
to
view the full content.*

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **1 Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results 2 PCY Algorithm ◆ Hash-based improvement to A-Priori. ◆ During Pass 1 of A-priori, most memory is idle. ◆ Use that memory to keep counts of buckets into which pairs of items are hashed. ◗ Just the count, not the pairs themselves. ◆ Gives extra condition that candidate pairs must satisfy on Pass 2. 3 Picture of PCY Hash table Item counts Bitmap Pass 1 Pass 2 Frequent items Counts of candidate pairs 4 PCY Algorithm --- Before Pass 1 Organize Main Memory ◆ Space to count each item. ◗ One (typically) 4-byte integer per item. ◆ Use the rest of the space for as many integers, representing buckets, as we can. 5 PCY Algorithm --- Pass 1 FOR (each basket) { FOR (each item) add 1 to item’s count; FOR (each pair of items) { hash the pair to a bucket; add 1 to the count for that bucket } } 6 Observations About Buckets 1. If a bucket contains a frequent pair, then the bucket is surely frequent. ◗ We cannot use the hash table to eliminate any member of this bucket. 1. Even without any frequent pair, a bucket can be frequent. ◗ Again, nothing in the bucket can be eliminated. 7 Observations --- (2) 3. But in the best case, the count for a bucket is less than the support s . ◗ Now, all pairs that hash to this bucket can be eliminated as candidates, even if the pair consists of two frequent items. 8 PCY Algorithm --- Between Passes ◆ Replace the buckets by a bit-vector: ◗ 1 means the bucket count exceeds the support s ( frequent bucket ); 0 means it did not. ◆ Integers are replaced by bits, so the bit-vector requires little second-pass space. ◆ Also, decide which items are frequent and list them for the second pass. 9 PCY Algorithm --- Pass 2 ◆ Count all pairs { i , j } that meet the conditions: 1. Both i and j are frequent items....

View
Full
Document