# Bbit 207 data mining and m mining data using

• 103

This preview shows page 82 - 85 out of 103 pages.

BBIT 207: DATA MINING AND MANAGEMENT 80 11.6. Mining data using association rule Consider the example below TID ITEMS T1 {Bread, Milk} T2 {Bread, Diapers, Beer, Eggs} T3 {Milk, Diapers, Beer, Cola} T4 {Bread, Milk, Diapers, Beer} T5 {Bread, Milk, Diapers, Cola} This market basket data can be represented in binary format as shown in the table below where each row correspond to a transaction and each column correspond to an item. An item can be treated as a binary variable whose value is one(1) if the item is present in a transaction and zero otherwise. Because the presence of an item in a transaction is often considered more important than its absence, an item is an asymmetric binary variable. TID Bread Milk Diapers Beer Eggs Cola T1 1 1 0 0 0 0 T2 1 0 1 1 1 0 T3 0 1 1 1 0 1 T4 1 1 1 1 0 0 T5 1 1 1 0 0 1 This representation is a simplistic view of real market basket data because it ignores certain important aspects of the data such as quantity of items sold or the price paid to purchase them. Example: Using the above table, let us consider the rule {Milk, Diapers} {Beer}. Since the support count for {Milk, Diapers, Beer} is 2 and the total number of transactions is 5,
BBIT 207: DATA MINING AND MANAGEMENT 81 the rule’s support is 2/5=0.4. The rules confidence is obtained by dividing the support count for {Milk, Diapers and Beer} by the support count for {Milk, Diapers}. Since there are three transactions that contain Milk, and diapers the confidence for this rule is 2/3=0.67. Support and confidence The support is an important measure because a rule that has very low support may occur simply by chance. A low support rule is also likely to be uninteresting from the business perspective because it may not be profitable to promote items that customers seldom buy together. For this reason, support is used to remove uninteresting rules. Confidence on the other hand measure the reliability of the inference made by the rule. Given the rule A B, the higher the confidence, the higher the more likely it is for B to be present in the transaction that contains A. Confidence also provides an estimate of the conditional probability of B given A.
BBIT 207: DATA MINING AND MANAGEMENT 82 11.7. Revision questions [a] Using two items, X and Y, define the following terms: i. Support ii. Confidence [b] Consider the transactions data below and answer the questions that follow: TID ITEMS 1 {Sugar,Bread, Milk} 2 {Sugar, Coffee, Bread} 3 {Bread, Milk, Egg, Fruit} 4 {Coffee, Milk} 5 {Milk, Bread, Fruit} 6 {Bread, Coffee} 7 {Coffee, Egg, Milk, Fruit} i. Calculate the support for rule {Bread,Milk} {Fruit} ii. Calculate the confidence for the rule in (i)