lec8 - CS 6093 Lecture 8 Spring 2011 Advanced Data Mining...

Info iconThis preview shows pages 1–16. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 6093 Lecture 8 Spring 2011 Advanced Data Mining Cong Yu 03/28/2011
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Announcements Again, I will be out of town from April 1 st to April 16 th – Sporadic email access, please plan accordingly if you need to discuss your projects with me – I will be available for project discussion for the next three days (Tuesday through Thursday) Fernando will cover advanced information retrieval topics for the next four lectures Quizzes: – Last quiz is graded, please pick it up at the end of the class Projects: – Seems everything is going well
Background image of page 2
Review of Last Lecture Data mining = Discover hidden and useful knowledge from large amounts of data – Customers who buy Harry Potter books often buy Twilight books – Users in NYC tend to search for expensive restaurants on Valentines Day Classic Studies – Association Rule Mining – Data Cube Analysis
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Association Rule Mining Review Association Rule – Rule: X Y for a transaction database D – Support: – Confidence: – And more, such as Lift and Leverage Problem: Frequent Itemset Mining Problem: Maximally Frequent Itemset Mining
Background image of page 4
Data Cube Analysis Review Date Country TV DVD PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico ALL Group By product, date Group By country, date Group By country, product Group By country Group By date Group By product Group By none Cube By product, data, country
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Data Cube Analysis Review Iceberg Cube: Cube by d 1 , d 2 , …, Having count(*) > s Measure – Algebraic – Monotonic – Sum, Average, Median, Count Unique, …
Background image of page 6
Algorithms Association rule mining – Apriori – MaxMiner Cube Anlaysis – BUC
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline Beyond item sets – Mining Sequential Patterns – Mining Graph Patterns Quiz + Break Clustering, if time permits
Background image of page 8
Alternative Data for Mining So far, each transaction is viewed as a set – The order of the items is ignored – The relationship between the items is ignored Sequential pattern mining Graph pattern mining s 1 s 2 s 3 i 1 i 2 i 3
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Sequential Pattern Mining Mining Sequential Patterns , Agrawal and Srikant, ICDE 1995
Background image of page 10
A General Sequence Definition A sequence is an order list of itemsets (more than just an item) • X = <(3) (45) (8)> Containment: s is contained in t if for a list of positions, p 1 < p 2 < …< p n , s p1 is a subset of t p1 , s p2 is a subset of t p2 , …, s pn is a subset of t pn • Y = <(7) (38) (9) (456) (8)> • X is contained in Y
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Support and Maximality View each customer’s purchase history as a sequence – Order the purchases (i.e., itemsets) according to time Support of a sequence S: – Fraction of customers whose purchase history contains S Maximal sequence M for a support threshold t: – There is no other sequence N, such that N contains M and N has a support greater than t
Background image of page 12
Examples
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Preparation: Transformation Three steps: – Keep only frequent items – Convert an itemset into a set of itemsets • (40 70) becomes {(40) (70) (40 70)} – Mapping to new item IDs
Background image of page 14