02-assoc

# Between its confidence and the fraction of baskets

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: hose with high positive or negative interest values 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 14 B1 = {m, c, b} B3 = {m, b} B5 = {m, p, b} B7 = {c, b, j} B2 = {m, p, j} B4= {c, j} B6 = {m, c, b, j} B8 = {b, c} Association rule: {m, b} →c Confidence = 2/4 = 0.5 Interest = |0.5 – 5/8| = 1/8 Item c appears in 5/8 of the baskets Rule is not very interesting! 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 15 Problem: find all association rules with support ≥s and confidence ≥c Note: support of an association rule is the support of the set of items on the left side Hard part: Finding the frequent itemsets! If {i1, i2,…, ik} → j has high support and confidence, then both {i1, i2,…, ik} and {i1, i2,…,ik, j} will be “frequent” Checking the confidence of association rules involving those sets is relatively easy 1/5/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 16 Step 1: Find all frequent itemsets We explain this next Let itemset I = {i1, i2,…, ik} be frequent Step 2: Rule generation For every subset A of I, generate a rule A → I\A Since I is frequent, A is also frequent Variant 1: Single pass to compute the rule confidence Variant 2: conf(AB→CD) = supp(ABCD)/supp(AB) Observation: If ABC→D is below confidence, so is AB→CD Can generate “bigger” rules from smaller ones! Output the rules above the confidence threshold 1/5/2011 Jure Leskovec, Stanford...
View Full Document

## This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.

Ask a homework question - tutors are online