assoc-rules2-3

assoc-rules2-3 - Improvements to APriori ParkChenYu...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 PCY Algorithm Hash-based improvement to A-Priori. During Pass 1 of A-priori, most memory is  idle. Use that memory to keep counts of buckets  into which pairs of items are hashed. Just the count, not the pairs themselves. Gives extra condition that candidate pairs  must satisfy on Pass 2.
Background image of page 2
3 Picture of PCY Hash table Item counts Bitmap Pass 1 Pass 2 Frequent items Counts of candidate    pairs
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 PCY Algorithm – Before Pass 1  Organize Main Memory Space to count each item. One (typically) 4-byte integer per item. Use the rest of the space for as many  integers, representing buckets, as we  can.
Background image of page 4
5 PCY Algorithm – Pass 1 FOR (each basket) { FOR (each item) add 1 to item’s count; FOR (each pair of items) { hash the pair to a bucket; add 1 to the count for that bucket } }
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 Observations About Buckets 1. If a bucket contains a frequent pair,  then the bucket is surely frequent. We cannot use the hash table to  eliminate any member of this bucket. 1. Even without any frequent pair, a  bucket can be frequent. Again, nothing in the bucket can be  eliminated.
Background image of page 6
7 Observations – (2) 3.  But in the best case, the count for a  bucket is less than the support  s . Now, all pairs that hash to this bucket can  be eliminated as candidates, even if the  pair consists of two frequent items.
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8 PCY Algorithm – Between  Passes Replace the buckets by a bit-vector: 1 means the bucket count exceeds the support  s   (a  frequent bucket  ); 0 means it did not. 4-byte integers are replaced by bits, so the  bit-vector requires 1/32 of memory. Also, decide which items are frequent and  list them for the second pass.
Background image of page 8
9 PCY Algorithm – Pass 2 Count all pairs { i } that meet the  conditions: 1. Both  i   and  j   are frequent items. 2. The pair { i j  }, hashes to a bucket number  whose bit in the bit vector is 1.
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 35

assoc-rules2-3 - Improvements to APriori ParkChenYu...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online