assoc-rules2-2

# assoc-rules2-2 - Improvements to APriori ParkChenYu...

This preview shows pages 1–10. Sign up to view the full content.

1 Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 PCY Algorithm Hash-based improvement to A-Priori. During Pass 1 of A-priori, most memory is  idle. Use that memory to keep counts of buckets  into which pairs of items are hashed. Just the count, not the pairs themselves. Gives extra condition that candidate pairs  must satisfy on Pass 2.
3 Picture of PCY Hash table Item counts Bitmap Pass 1 Pass 2 Frequent items Counts of candidate pairs

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 PCY Algorithm --- Before Pass 1  Organize Main Memory Space to count each item. One (typically) 4-byte integer per item. Use the rest of the space for as many  integers, representing buckets, as we  can.
5 PCY Algorithm --- Pass 1 FOR (each basket) { FOR (each item) add 1 to item’s count; FOR (each pair of items) { hash the pair to a bucket; add 1 to the count for that bucket } }

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 Observations About Buckets 1. If a bucket contains a frequent pair,  then the bucket is surely frequent. We cannot use the hash table to  eliminate any member of this bucket. 1. Even without any frequent pair, a  bucket can be frequent. Again, nothing in the bucket can be  eliminated.
7 Observations --- (2) 3.  But in the best case, the count for a  bucket is less than the support  s . Now, all pairs that hash to this bucket can  be eliminated as candidates, even if the  pair consists of two frequent items.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
8 PCY Algorithm --- Between  Passes Replace the buckets by a bit-vector: 1 means the bucket count exceeds the support  s  ( frequent bucket ); 0 means it did not. Integers are replaced by bits, so the bit- vector requires little second-pass space. Also, decide which items are frequent and  list them for the second pass.
9 PCY Algorithm --- Pass 2 Count all pairs { i , } that meet the  conditions: 1. Both  i   and  j   are frequent items.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern