1 Association Rules Market Baskets Frequent Itemsets A-Priori Algorithm

2 The Market-Basket Model rhombus6 A large set of items , e.g., things sold in a supermarket. rhombus6 A large set of baskets , each of which is a small set of the items, e.g., the things one customer buys on one day.
3 Market-Baskets – (2) rhombus6 Really a general many-many mapping (association) between two kinds of things. rhombus4 But we ask about connections among “items,” not “baskets.” rhombus6 The technology focuses on common events , not rare events (“long tail”).

4 Support rhombus6 Simplest question: find sets of items that appear “frequently” in the baskets. rhombus6 Support for itemset I = the number of baskets containing all items in I . rhombus4 Sometimes given as a percentage. rhombus6 Given a support threshold s , sets of items that appear in at least s baskets are called frequent itemsets .
5 Example : Frequent Itemsets rhombus6 Items={milk, coke, pepsi, beer, juice}. rhombus6 Support = 3 baskets. B 1 = {m, c, b} B 2 = {m, p, j} B 3 = {m, b} B 4 = {c, j} B 5 = {m, p, b} B 6 = {m, c, b, j} B 7 = {c, b, j} B 8 = {b, c} rhombus6 Frequent itemsets: {m}, {c}, {b}, {j}, , {b,c} , {c,j}. {m,b}

6 Applications – (1) rhombus6 Items = products; baskets = sets of products someone bought in one trip to the store. rhombus6 Example application : given that many people buy beer and diapers together: rhombus4 Run a sale on diapers; raise price of beer. rhombus6 Only useful if many buy diapers & beer.
7 Applications – (2) rhombus6 Baskets = sentences; items = documents containing those sentences. rhombus6 Items that appear together too often could represent plagiarism. rhombus6 Notice items do not have to be “in” baskets.

8 Applications – (3) rhombus6 Baskets = Web pages; items = words. rhombus6 Unusual words appearing together in a large number of documents, e.g., “Brad” and “Angelina,” may indicate an interesting relationship.
9 Aside : Words on the Web rhombus6 Many Web-mining applications involve words. 1. Cluster pages by their topic, e.g., sports. 2. Find useful blogs, versus nonsense. 3. Determine the sentiment (positive or negative) of comments.

