Lecture06 - Data Mining: Principle and Algorithms -Chapter...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
1 2009 11 11 Data Mining: Principles and Algorithms Data Mining: Principle and Algorithms -Chapter 3.6- - Sequential Pattern Mining - Jianyong Wang Department of Computer Science and Technology Tsinghua University, Beijing, China Email: jianyong@tsinghua.edu.cn
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2009 11 11 Data Mining: Principles and Algorithms 2 Chapter 3 Mining Frequent Patterns, Association and Correlations § 3.1 Basic concepts and a road map § 3.2 Efficient and scalable frequent itemset mining methods § 3.3 Mining various kinds of association rules § 3.4 From association mining to correlation analysis § 3.5 Constraint-based association mining § 3.6 Sequential pattern mining Frequent sequence mining Closed sequence mining Typical applications of frequent sequence mining § 3.7 Graph pattern mining § 3.8 Summary
Background image of page 2
2009 11 11 Data Mining: Principles and Algorithms 3 Outline Problem statement and motivation An overview of the current solutions GSP, SPADE, PrefixSPan, CloSpan BIDE – the state of the art algorithm Bi-Directional Extension closure checking scheme BackScan search space pruning ScanSkip optimization A typical application: exploit sequencing to accelerate hot XML query pattern mining SOLARIA: a sequence based hot XML query pattern mining algorithm Unique sequence representation of a tree structure Structure-preserving sequence mining
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2009 11 11 Data Mining: Principles and Algorithms 4 Problem Statement Frequent subsequence mining from a sequence DB S is frequent if its support is no smaller than a minimum support Closed subsequence mining S is closed if none of its super-sequence has the same support as S A sequence database < A B B C A > 40 < C A B C > 30 < A B C B > 20 < C A A B C > 10 Sequence SID < C C > is a subsequence of < C A B C >, and < C A B C > Given a support threshold min_sup =2, < C C > is a frequent sequence , but it is not closed , while < C A B C > is a frequent closed sequence.
Background image of page 4
2009 11 11 Data Mining: Principles and Algorithms 5 Motivation: w hy mining frequent sequences? Different kinds of sequence databases Customer shopping sequences; Web click-streams; DNA sequences; production and engineering processes; nature, storm, earthquake, and biological evolutions; and so on Various applications Association/causality analysis Frequent-sequence based classification Frequent-sequence based clustering Sequence based hot XML query pattern mining … …
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2009 11 11 Data Mining: Principles and Algorithms 6 Motivation: w hy mining closed sequences? All the subsequences of a long frequent sequence must be frequent—Apriori property If (a 1 ,…, a 64 ) is frequent, it will generate frequent subsequences—What an exponential growth!
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 62

Lecture06 - Data Mining: Principle and Algorithms -Chapter...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online