lecture19

Data Mining CS57300 Purdue University November 18, 2010

Pattern mining: representation & learning
Data mining components • Task specifcation: Pattern discovery • Data representation: Homogeneous IID data • Knowledge representation • Learning technique

Pattern discovery • Models describe entire dataset (or large part of it) • Pattern characterize local aspects of data • Pattern: predicate that returns “true” for the instances in the data where the pattern occurs and “false” otherwise • Task: Fnd descriptive associations between variables
Examples • Supermarket transaction database • 10% of the customers buy wine and cheese • Telecommunications alarms database • If alarms A and B occur within 30 seconds of each other then alarm C occurs within 60 seconds with p=0.5 • Web log dataset • If a person visits the CNN website, there is a 60% chance the person will visit the ABC News website in the same month

Pattern in tabular data • Primitive pattern: subset of all possible observations over variables X 1 ,...,X p • If X k is categorical then X k =c is a primitive pattern • If X k is ordinal then X k c is a primitive pattern • Start from primitive patterns and combine using logical connectives such as AND and OR • age<40 AND income<100,000 • chips=1 AND (beer=1 OR soda=1)
Pattern space • Set of legal patterns • DeFned through the set of primitive patterns and operators to combine primitives • Example • If variable X 1 ,...,X p are all binary we can deFne the space of patterns to be all conjunctions of the form (X i1 =1) AND (X i2 =1) AND . .. AND (X ik =1)

### Page1 / 30

