Lecture05-1 - Data Mining: Principles and Algorithms...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
October 29, 2009 Data Mining: Principle and Algorithms 1 Data Mining: Principles and Algorithms Jianyong Wang Database Lab, Institute of Software Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
October 29, 2009 Data Mining: Principle and Algorithms 2 Chapter 3: Mining Frequent Patterns, Association and Correlations Basic concepts and a road map Efficient and scalable frequent itemset mining methods Mining various kinds of association rules From association mining to correlation analysis Constraint-based association mining Sequential pattern mining Summary
Background image of page 2
October 29, 2009 Data Mining: Principle and Algorithms 3 Efficient Mining of Frequent Closed Itemsets Closed pattern mining strategies - The current status The CHARM algorithm - A Vertical Data Format based Closed itemset Mining algorithm The CLOSET+ algorithm - The hybrid tree-projection method - The item-skipping technique - Efficient subset checking Experimental results - Comparison with OP, CHARM and CLOSET - Scalability test Conclusion
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Various Closed Pattern Mining Strategies
Background image of page 4
October 29, 2009 Data Mining: Principle and Algorithms 5 Problem statement (re-visit) Mining frequent closed itemset - Itemset »A non-empty set of distinct items - Frequent itemset X »Given a support threshold, min_sup , an itemset X is frequent if - Closed itemset Y »There exists no itemset Y , such that and hold sup min_ ) X ( Sup Y Y ' ) ( ) ( ' Y Sup Y Sup
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
October 29, 2009 Data Mining: Principle and Algorithms 6 Typical Closed Itemset Mining Algoritms Typical algorithms - A-close »Breath-first search based - CLOSET/ CLOSET+ » FP-tree and depth-first search based - MAFIA »Vertical bitmap representation - CHARM »Vertical data representation and diffset technique
Background image of page 6
October 29, 2009 Data Mining: Principle and Algorithms 7 Running example min_sup =2 , f_list =<f:4, c:4, a:3, b:3, m:3, p:3> Table 1 a transaction database TDB tid itemset ordered frequent item list 10 a, c, f, m, p f, c, a, m, p 20 a, c, d, f, m, p f, c, a, m, p 30 a, b, c, f, g, m f, c, a, b, m 40 b, f, i f, b 50 b, c, n, p c, b, p
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
October 29, 2009 Data Mining: Principle and Algorithms 8 Running example (a) The set of frequent closed itemsets (b) FP-tree Ø root f:4 c:4 b:3 fb:2 cb:2 cp:3 fcam:3 fcamp:2 c:1 b:1 p:1 b:1 b:1 m:1 f:4 c:3 a:3 m:2 p:2
Background image of page 8
October 29, 2009 Data Mining: Principle and Algorithms 9 Strategies for Frequent Closed Itemset Mining Search order - Breadth-first search vs. depth-first search » Depth-first search is more efficient than breadth-first search for mining long patterns Data representation - Horizontal vs. vertical data formats » Need further performance study to compare these two schemes in terms of scalability, runtime and space usage efficiency
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
October 29, 2009 Data Mining: Principle and Algorithms 10 Strategies for Frequent Closed Itemset Mining Data compression techniques - FP-tree (CLOSET+) vs. diffset(CHARM) Existing search space pruning - Item merging » If every transaction containing itemset X also contains itemset Y but not any proper superset of Y , then X U Y
Background image of page 10
Image of page 11
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/02/2010 for the course COMPUTER DM2009F taught by Professor Wangwei during the Fall '09 term at Tsinghua University.

Page1 / 44

Lecture05-1 - Data Mining: Principles and Algorithms...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online