Lecture05-2 - Data Mining: Principles and Algorithms...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
November 5, 2009 Data Mining: Principle and Algorithms 1 Data Mining: Principles and Algorithms Jianyong Wang Database Lab, Institute of Software Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
November 5, 2009 Data Mining: Principle and Algorithms 2 Chapter 3: Mining Frequent Patterns, Association and Correlations Basic concepts and a road map Efficient and scalable frequent itemset mining methods Mining various kinds of association rules From association mining to correlation analysis Constraint-based association mining Sequential pattern mining Graph pattern mining Summary
Background image of page 2
November 5, 2009 Data Mining: Principle and Algorithms 3 Mining Various Kinds of Association Rules Mining multi-level association Mining multi-dimensional association Mining quantitative association Mining interesting correlation patterns
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
November 5, 2009 Data Mining: Principle and Algorithms 4 Mining Multiple-Level Association Rules Items often form hierarchy Flexible support settings - Items at the lower level are expected to have lower support Exploration of shared multi-level mining (Agrawal & Srikant@VLDB 95) uniform support Milk [support = 10%] 2 % Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Level 1 min_sup = 5% Level 2 min_sup = 3% reduced support
Background image of page 4
November 5, 2009 Data Mining: Principle and Algorithms 5 Multi-level Association: Redundancy Filtering Some rules may be redundant due to ancestor relationships between items. Example - milk wheat bread [support = 8%, confidence = 70%] - 2% milk wheat bread [support = 2%, confidence = 72%] We say the first rule is an ancestor of the second rule. A rule is redundant if its support is close to the expected value, based on the rule s ancestor.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
November 5, 2009 Data Mining: Principle and Algorithms 6 Mining Multi-Dimensional Association Single-dimensional rules: buys(X, milk ) buys(X, bread ) Multi-dimensional rules: 2 dimensions or predicates - Inter-dimension association rules ( no repeated predicates ) age(X, 19-25 ) occupation(X, student ) buys(X, coke ) - hybrid-dimension association rules ( repeated predicates ) age(X, 19-25 ) buys(X, popcorn ) buys(X, coke )
Background image of page 6
November 5, 2009 Data Mining: Principle and Algorithms 7 Mining Quantitative Associations Techniques can be categorized by how numerical attributes, such as age or salary are treated 1. Static discretization based on predefined concept hierarchies 2. Dynamic discretization based on data distribution (quantitative rules, e.g., Agrawal & Srikant@SIGMOD96 and Lent, Swami and Widom ICDE 97)
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
November 5, 2009 Data Mining: Principle and Algorithms 8 Quantitative Association Rules age(X,”34 - 35”) income(X,”30 - 50K”) buys(X,”high resolution TV”) Proposed by Lent, Swami and Widom ICDE
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 34

Lecture05-2 - Data Mining: Principles and Algorithms...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online