# Lecture06 - Data Mining Principle and Algorithms-Chapter...

This preview shows pages 1–7. Sign up to view the full content.

1 2009 11 11 Data Mining: Principles and Algorithms Data Mining: Principle and Algorithms -Chapter 3.6- - Sequential Pattern Mining - Jianyong Wang Department of Computer Science and Technology Tsinghua University, Beijing, China Email: [email protected]

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2009 11 11 Data Mining: Principles and Algorithms 2 Chapter 3 Mining Frequent Patterns, Association and Correlations § 3.1 Basic concepts and a road map § 3.2 Efficient and scalable frequent itemset mining methods § 3.3 Mining various kinds of association rules § 3.4 From association mining to correlation analysis § 3.5 Constraint-based association mining § 3.6 Sequential pattern mining Frequent sequence mining Closed sequence mining Typical applications of frequent sequence mining § 3.7 Graph pattern mining § 3.8 Summary
2009 11 11 Data Mining: Principles and Algorithms 3 Outline Problem statement and motivation An overview of the current solutions GSP, SPADE, PrefixSPan, CloSpan BIDE – the state of the art algorithm Bi-Directional Extension closure checking scheme BackScan search space pruning ScanSkip optimization A typical application: exploit sequencing to accelerate hot XML query pattern mining SOLARIA: a sequence based hot XML query pattern mining algorithm Unique sequence representation of a tree structure Structure-preserving sequence mining

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2009 11 11 Data Mining: Principles and Algorithms 4 Problem Statement Frequent subsequence mining from a sequence DB S is frequent if its support is no smaller than a minimum support Closed subsequence mining S is closed if none of its super-sequence has the same support as S A sequence database < A B B C A > 40 < C A B C > 30 < A B C B > 20 < C A A B C > 10 Sequence SID < C C > is a subsequence of < C A B C >, and < C A B C > Given a support threshold min_sup =2, < C C > is a frequent sequence , but it is not closed , while < C A B C > is a frequent closed sequence.
2009 11 11 Data Mining: Principles and Algorithms 5 Motivation: w hy mining frequent sequences? Different kinds of sequence databases Customer shopping sequences; Web click-streams; DNA sequences; production and engineering processes; nature, storm, earthquake, and biological evolutions; and so on Various applications Association/causality analysis Frequent-sequence based classification Frequent-sequence based clustering Sequence based hot XML query pattern mining … …

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2009 11 11 Data Mining: Principles and Algorithms 6 Motivation: w hy mining closed sequences? All the subsequences of a long frequent sequence must be frequent—Apriori property If (a 1 ,…, a 64 ) is frequent, it will generate frequent subsequences—What an exponential growth!
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 06/02/2010 for the course COMPUTER DM2009F taught by Professor Wangwei during the Fall '09 term at Tsinghua University.

### Page1 / 62

Lecture06 - Data Mining Principle and Algorithms-Chapter...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online