# Lecture04 - Data Mining Principles and Algorithms Jianyong...

This preview shows pages 1–7. Sign up to view the full content.

October 22, 2009 Data Mining: Principle and Algorithms 1 Data Mining: Principles and Algorithms Jianyong Wang Database Lab, Institute of Software Department of Computer Science and Technology Tsinghua University [email protected]

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
October 22, 2009 Data Mining: Principle and Algorithms 2 Chapter 3: Mining Frequent Patterns, Association and Correlations Basic concepts and a road map Efficient and scalable frequent itemset mining methods Mining various kinds of association rules From association mining to correlation analysis Constraint-based association mining Sequential pattern mining Graph pattern mining Summary
October 22, 2009 Data Mining: Principle and Algorithms 3 What Is Frequent Pattern Analysis? Frequent pattern : a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and association rule mining Motivation: Finding inherent regularities in data - What products were often purchased together? Beer and diapers?! - What are the subsequent purchases after buying a PC? - What kinds of DNA are sensitive to this new drug? - Can we automatically classify web documents? Applications - Basket data analysis, sale campaign analysis, Web log (click stream) analysis, DNA sequence analysis, recommender system, associative classifier, feature selection.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
October 22, 2009 Data Mining: Principle and Algorithms 4 Why Is Freq. Pattern Mining Important? Discloses an intrinsic and important property of data sets Forms the foundation for many essential data mining tasks - Association, correlation, and causality analysis - Sequential, structural (e.g., sub-graph) patterns - Pattern analysis in spatiotemporal, multimedia, time-series, and stream data - Classification: associative classification, feature selection - Cluster analysis: frequent pattern-based clustering - Data warehousing: iceberg cube and cube-gradient -
October 22, 2009 Data Mining: Principle and Algorithms 5 Freq. Pattern Mining: an Important Topic The top 5 most referenced DB conference publications (1994-2003), adapted From ―Citation analysis of database publications‖, SIGMOD Record, 34(4), Dec. 2005 Title Authors Published in #Cit. 1 2 3 4 5 Fast Algorithms for Mining Association Rules Querying Heterogeneous Information Sources Using Source Descriptions BIRCH: An Efficient Data Clustering Method for Very Large Databases Mining Frequent Patterns without Candidate Generation Implementing Data Cubes Efficiently R. Agrawal, R. Srikant A.Y. Levy, A. Rajaraman, J.J. Ordille T. Zhang, R. Ramakrishnan, M. Livny J. Han, J. Pei, Y. Yin V. Harinarayan, A. Rajaraman, J.D. Ullman VLDB ’94 VLDB ’96 SIGMOD ’96 SIGMOD ’00 SIGMOD ’96 2261 692 617 573 559

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
October 22, 2009 Data Mining: Principle and Algorithms 6 Basic Concepts: Frequent Patterns and Association Rules Itemset X = {x 1 , , x k } Find all the rules X Y with minimum support and confidence - support , s , probability that a transaction contains X Y , i.e., s=P( X Y ) - confidence , c, conditional probability
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 06/02/2010 for the course COMPUTER DM2009F taught by Professor Wangwei during the Fall '09 term at Tsinghua University.

### Page1 / 42

Lecture04 - Data Mining Principles and Algorithms Jianyong...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online