# Lecture05-1 - Data Mining Principles and Algorithms...

This preview shows pages 1–11. Sign up to view the full content.

October 29, 2009 Data Mining: Principle and Algorithms 1 Data Mining: Principles and Algorithms Jianyong Wang Database Lab, Institute of Software Department of Computer Science and Technology Tsinghua University [email protected]

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
October 29, 2009 Data Mining: Principle and Algorithms 2 Chapter 3: Mining Frequent Patterns, Association and Correlations Basic concepts and a road map Efficient and scalable frequent itemset mining methods Mining various kinds of association rules From association mining to correlation analysis Constraint-based association mining Sequential pattern mining Summary
October 29, 2009 Data Mining: Principle and Algorithms 3 Efficient Mining of Frequent Closed Itemsets Closed pattern mining strategies - The current status The CHARM algorithm - A Vertical Data Format based Closed itemset Mining algorithm The CLOSET+ algorithm - The hybrid tree-projection method - The item-skipping technique - Efficient subset checking Experimental results - Comparison with OP, CHARM and CLOSET - Scalability test Conclusion

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Various Closed Pattern Mining Strategies
October 29, 2009 Data Mining: Principle and Algorithms 5 Problem statement (re-visit) Mining frequent closed itemset - Itemset »A non-empty set of distinct items - Frequent itemset X »Given a support threshold, min_sup , an itemset X is frequent if - Closed itemset Y »There exists no itemset Y , such that and hold sup min_ ) X ( Sup Y Y ' ) ( ) ( ' Y Sup Y Sup

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
October 29, 2009 Data Mining: Principle and Algorithms 6 Typical Closed Itemset Mining Algoritms Typical algorithms - A-close »Breath-first search based - CLOSET/ CLOSET+ » FP-tree and depth-first search based - MAFIA »Vertical bitmap representation - CHARM »Vertical data representation and diffset technique
October 29, 2009 Data Mining: Principle and Algorithms 7 Running example min_sup =2 , f_list =<f:4, c:4, a:3, b:3, m:3, p:3> Table 1 a transaction database TDB tid itemset ordered frequent item list 10 a, c, f, m, p f, c, a, m, p 20 a, c, d, f, m, p f, c, a, m, p 30 a, b, c, f, g, m f, c, a, b, m 40 b, f, i f, b 50 b, c, n, p c, b, p

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
October 29, 2009 Data Mining: Principle and Algorithms 8 Running example (a) The set of frequent closed itemsets (b) FP-tree Ø root f:4 c:4 b:3 fb:2 cb:2 cp:3 fcam:3 fcamp:2 c:1 b:1 p:1 b:1 b:1 m:1 f:4 c:3 a:3 m:2 p:2
October 29, 2009 Data Mining: Principle and Algorithms 9 Strategies for Frequent Closed Itemset Mining Search order - Breadth-first search vs. depth-first search » Depth-first search is more efficient than breadth-first search for mining long patterns Data representation - Horizontal vs. vertical data formats » Need further performance study to compare these two schemes in terms of scalability, runtime and space usage efficiency

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
October 29, 2009 Data Mining: Principle and Algorithms 10 Strategies for Frequent Closed Itemset Mining Data compression techniques - FP-tree (CLOSET+) vs. diffset(CHARM) Existing search space pruning - Item merging » If every transaction containing itemset X also contains itemset Y but not any proper superset of Y , then X U Y
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 44

Lecture05-1 - Data Mining Principles and Algorithms...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online