02-assoc

02-assoc - CS246 Mining Massive Datasets Jure Leskovec...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Gradiance: Create an account at http://www.gradiance.com/services The class token is: D8C4D42B Enter the token at the bottom of the homepage First quiz (4 questions) on MapReduce is up It is due in a week (January 11) HW1 will be out soon – check the website HW Q&A website: http://www.piazzza.com/stanford/cs246 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Background image of page 2
Friday 1/7 5-7pm: review of basic concepts of linear algebra, probability and statistics Tuesday 1/11 5-7pm: Hadoop Q&A session We will post the location and on the website and the mailing list soon 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Supermarket shelf management – Market-basket model: Goal: To identify items that are bought together by sufficiently many customers Approach: Process the sales data collected with barcode scanners to find dependencies among items A classic rule: If one buys diaper and milk, then he is likely to buy beer. Don’t be surprised if you find six-packs next to diapers! 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 4 TID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
Background image of page 4
5 A large set of items e.g., things sold in a supermarket A large set of baskets , each is a small subset of items e.g., the things one customer buys on one day Can be used to model any many-many relationship, not just in the retail setting Find “interesting” connections between items 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets TID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Given a set of baskets Want to discover association rules : People who bought {x,y,z} tend to buy {v,w} Amazon! 2 step approach 1) Find frequent itemsets 2) Generate the association rules 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 6 Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} TID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk Input: Output:
Background image of page 6
Simplest question: Find sets of items that appear together “frequently” in baskets Support for itemset I : number of baskets containing all items in I Often expressed as a fraction of the total number of baskets Given a support threshold s , then sets of items that appear in at least s baskets are called frequent itemsets 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
8 Items = {milk, coke, pepsi, beer, juice} Support = 3 baskets B 1 = {m, c, b} B 2 = {m, p, j} B 3 = {m, b} B 4 = {c, j} B 5 = {m, p, b} B 6 = {m, c, b, j} B 7 = {c, b, j} B 8 = {b, c} Frequent itemsets: {m}, {c}, {b}, {j}, , {b,c} , {c,j}.
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 55

02-assoc - CS246 Mining Massive Datasets Jure Leskovec...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online