assoc-rules2-2

Assoc-rules2-2 - 1 Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results 2 PCY Algorithm

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results 2 PCY Algorithm ◆ Hash-based improvement to A-Priori. ◆ During Pass 1 of A-priori, most memory is idle. ◆ Use that memory to keep counts of buckets into which pairs of items are hashed. ◗ Just the count, not the pairs themselves. ◆ Gives extra condition that candidate pairs must satisfy on Pass 2. 3 Picture of PCY Hash table Item counts Bitmap Pass 1 Pass 2 Frequent items Counts of candidate pairs 4 PCY Algorithm --- Before Pass 1 Organize Main Memory ◆ Space to count each item. ◗ One (typically) 4-byte integer per item. ◆ Use the rest of the space for as many integers, representing buckets, as we can. 5 PCY Algorithm --- Pass 1 FOR (each basket) { FOR (each item) add 1 to item’s count; FOR (each pair of items) { hash the pair to a bucket; add 1 to the count for that bucket } } 6 Observations About Buckets 1. If a bucket contains a frequent pair, then the bucket is surely frequent. ◗ We cannot use the hash table to eliminate any member of this bucket. 1. Even without any frequent pair, a bucket can be frequent. ◗ Again, nothing in the bucket can be eliminated. 7 Observations --- (2) 3. But in the best case, the count for a bucket is less than the support s . ◗ Now, all pairs that hash to this bucket can be eliminated as candidates, even if the pair consists of two frequent items. 8 PCY Algorithm --- Between Passes ◆ Replace the buckets by a bit-vector: ◗ 1 means the bucket count exceeds the support s ( frequent bucket ); 0 means it did not. ◆ Integers are replaced by bits, so the bit-vector requires little second-pass space. ◆ Also, decide which items are frequent and list them for the second pass. 9 PCY Algorithm --- Pass 2 ◆ Count all pairs { i , j } that meet the conditions: 1. Both i and j are frequent items....
View Full Document

This document was uploaded on 03/04/2012.

Page1 / 33

Assoc-rules2-2 - 1 Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results 2 PCY Algorithm

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online