# Between its confidence and the fraction of baskets

Unformatted text preview: hose with high positive or negative interest values 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 14 B1 = {m, c, b} B3 = {m, b} B5 = {m, p, b} B7 = {c, b, j} B2 = {m, p, j} B4= {c, j} B6 = {m, c, b, j} B8 = {b, c} Association rule: {m, b} →c Confidence = 2/4 = 0.5 Interest = |0.5 – 5/8| = 1/8 Item c appears in 5/8 of the baskets Rule is not very interesting! 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 15 Problem: find all association rules with support ≥s and confidence ≥c Note: support of an association rule is the support of the set of items on the left side Hard part: Finding the frequent itemsets! If {i1, i2,…, ik} → j has high support and confidence, then both {i1, i2,…, ik} and {i1, i2,…,ik, j} will be “frequent” Checking the confidence of association rules involving those sets is relatively easy 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 16 Step 1: Find all frequent itemsets We explain this next Let itemset I = {i1, i2,…, ik} be frequent Step 2: Rule generation For every subset A of I, generate a rule A → I\A Since I is frequent, A is also frequent Variant 1: Single pass to compute the rule confidence Variant 2: conf(AB→CD) = supp(ABCD)/supp(AB) Observation: If ABC→D is below confidence, so is AB→CD Can generate “bigger” rules from smaller ones! Output the rules above the confidence threshold 1/5/2011 Jure Leskovec, Stanford...
