assoc-rules2-2

assoc-rules2-2 - Improvements to APriori ParkChenYu...

Info icon This preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 PCY Algorithm Hash-based improvement to A-Priori. During Pass 1 of A-priori, most memory is  idle. Use that memory to keep counts of buckets  into which pairs of items are hashed. Just the count, not the pairs themselves. Gives extra condition that candidate pairs  must satisfy on Pass 2.
Image of page 2
3 Picture of PCY Hash table Item counts Bitmap Pass 1 Pass 2 Frequent items Counts of candidate pairs
Image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
4 PCY Algorithm --- Before Pass 1  Organize Main Memory Space to count each item. One (typically) 4-byte integer per item. Use the rest of the space for as many  integers, representing buckets, as we  can.
Image of page 4
5 PCY Algorithm --- Pass 1 FOR (each basket) { FOR (each item) add 1 to item’s count; FOR (each pair of items) { hash the pair to a bucket; add 1 to the count for that bucket } }
Image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
6 Observations About Buckets 1. If a bucket contains a frequent pair,  then the bucket is surely frequent. We cannot use the hash table to  eliminate any member of this bucket. 1. Even without any frequent pair, a  bucket can be frequent. Again, nothing in the bucket can be  eliminated.
Image of page 6
7 Observations --- (2) 3.  But in the best case, the count for a  bucket is less than the support  s . Now, all pairs that hash to this bucket can  be eliminated as candidates, even if the  pair consists of two frequent items.
Image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
8 PCY Algorithm --- Between  Passes Replace the buckets by a bit-vector: 1 means the bucket count exceeds the support  s  ( frequent bucket ); 0 means it did not. Integers are replaced by bits, so the bit- vector requires little second-pass space. Also, decide which items are frequent and  list them for the second pass.
Image of page 8
9 PCY Algorithm --- Pass 2 Count all pairs { i , } that meet the  conditions: 1. Both  i   and  j   are frequent items.
Image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern