. . . . . . DATA MINING Susan Holmes © Stats202 Lecture 12 Fall 2010 ABabcdfghiejkl

. . . . . . Last Time: Decision Trees and Classifcation Examples I Two sets oF Data: Training and Test. I Response Y is a nominal/categorical variable. I Explanatory variables can be continuous AND nominal AND ordinal. I Indices oF Purity: Gini, Entropy (Deviance) and Misclassifcation.

. . . . . . Example of ClassiFcation Trees library(ElemStatLearn) ##For spam data data(spam) ###Last few variables look like this: A.51 A.52 A.53 A.54 A.55 A.56 A.57 spam 1 0 0.778 0.000 0.000 3.756 61 278 spam 2 0 0.372 0.180 0.048 5.114 101 1028 spam 3 0 0.276 0.184 0.010 9.821 485 2259 spam 4 0 0.137 0.000 0.000 3.537 40 191 spam > nrow(spam) [1] 4601 > sum(spam\$spam!="email")/nrow(spam) [1] 0.3940448 > sum(spam\$spam=="email")/nrow(spam) [1] 0.6059552
. . . . . . Example of ClassiFcation Trees

## This note was uploaded on 07/29/2011 for the course STAT 202 at Stanford.

