# dm4part2 - University of Florida CISE department Gator...

This preview shows pages 1–12. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: University of Florida CISE department Gator Engineering Association Analysis Part 2 Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 FP Growth (Pei et al 2000) • Use a compressed representation of the database using an FP-tree • Once an FP-tree has been constructed, it uses recursive divide and conquer approach to mine frequent itemsets University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 FP-Tree Construction null A:1 B:1 null A:1 B:1 B:1 C:1 D:1 After reading TID=1: After reading TID=2: University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 FP-Tree Construction null A:7 B:5 B:3 C:3 D:1 C:1 D:1 C:3 D:1 D:1 E:1 E:1 Pointers are used to assist frequent itemset generation D:1 E:1 Transaction Database Header table University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 FP-Tree Growth null A:7 B:5 B:1 C:1 D:1 C:1 D:1 C:3 D:1 D:1 Conditional Pattern base for D: P = {(A:1,B:1,C:1), (A:1,B:1), (A:1,C:1), (A:1), (B:1,C:1)} Recursively apply FP- growth on P Frequent Itemsets found (with sup > 1): AD, BD, CD, ACD, BCD D:1 University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Tree Projection (Agrawal et al 1999) Set enumeration tree: Possible Extension: E(A) = {B,C,D,E} Possible Extension: E(ABC) = {D,E} University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Tree Projection • Items are listed in lexicographic order • Each node P stores the following information: – Itemset for node P – List of possible lexicographic extensions of P: E(P) – Pointer to projected database of its ancestor node – Bit vector containing information about which transactions in the projected database containing the itemset University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Projected Database Original Database: Projected Database for node A: For each transaction T, projected transaction at node A is T ∩ E(A) University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 ECLAT (Zaki 2000) • For each item, store a list of transaction ids (tids) TID-list University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 ECLAT • Determine support of any k-itemset by intersecting tid- lists of two of its (k-1) subsets. • 3 traversal approaches: – top-down, bottom-up and hybrid • Advantage: very fast support counting • Disadvantage: intermediate tid-lists may become too large for memory ∧ → University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Interestingness Measures • Association rule algorithms tend to produce too many rules – many of them are uninteresting or redundant – Redundant if {A,B,C}...
View Full Document

{[ snackBarMessage ]}

### Page1 / 46

dm4part2 - University of Florida CISE department Gator...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online