dm4part3 - University of Florida CISE department...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
University of Florida CISE department Gator Engineering Association Analysis Part 3 Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Multi-level Association Rules Food Bread Milk Skim 2% Electronics Computers Home Desktop Laptop Wheat White Foremost Kemps DVD TV Printer Scanner Accessory
Background image of page 2
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Negative Association When do infrequent patterns become interesting? Negative correlation: P(A,B) < P(A)P(B) e.g: Windows vs Linux Negative association rules: ( A B): P( A, B) = P(B) P(A,B) e.g: Regular Diet (s=0.17, c=0.25) Coke Diet Diet Regular 1 32 33 Regular 17 50 67 18 82 100
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Approach 1: Using Negative Items Tid A A B B C C D D 1 1 0 0 1 1 0 0 1 2 1 0 0 1 0 1 0 1 3 1 0 0 1 1 0 0 1 4 1 0 1 0 0 1 0 1 5 1 0 0 1 0 1 1 0 Computationally expensive Tends to produce many uninteresting negative associations
Background image of page 4
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Approach 1: Using Negative Items B B A 10 320 A 170 500 Size 2: Size 3: A A B B C C Support of {A,B}, {A, B} and { A,B} can be large
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Approach 2: Using Positive Itemsets Boulicaut et al [2000]: Compute support of negative itemsets based on the support of positive itemsets e.g. X = Y   Z e.g.: P(ABCD) = P(AB)-P(ABC)-P(ABD)+P(ABCD) To use this formula: Need to use a very low support threshold, or Use approximation   ) ( | | | | ) ( 1 ) ( Z Y I Y Y I I P X P
Background image of page 6
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Approach 3: Using Domain Knowledge Compute expected support using item taxonomy There could be multiple taxonomies defined (based on type, brand, size, etc.) Limited to the nodes that are directly connected to the frequent itemsets ) sup( ) sup( ) sup( )) (sup( ) sup( ) sup( ) sup( )) (sup( ) sup( ) sup( ) sup( ) sup( ) sup( )) (sup( G H CG CH Exp G J CJ Exp G C J E EJ Exp Suppose C and G are frequent: A B C D E F G H J K
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Approach 3: Using Domain Knowledge A negative itemset is a set of items whose actual support is significantly lower than its expected support Negative association rule: X Y Rule interest measure: Approach: Find frequent itemsets at each level of the taxonomy Identify candidate negative itemsets based on the frequent itemsets found and their item taxonomy Count actual support of candidate itemsets and retain only the
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 11/13/2011 for the course CIS 4930 taught by Professor Staff during the Spring '08 term at University of Florida.

Page1 / 34

dm4part3 - University of Florida CISE department...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online