Chap7_CART

Chapter 7 Classification and Regression Trees

Chapter 7 – Classification and Regression Trees © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce

Trees and Rules Goal: Classify or predict an outcome based on a set of predictors The output is a set of rules Example: Goal: classify a record as “will accept credit card offer” or “will not accept” Rule might be “IF (Income > 92.5) AND (Education < 1.5) AND (Family <= 2.5) THEN Class = 0 (nonacceptor) Also called CART, Decision Trees, or just Trees Rules are represented by tree diagrams

Key Ideas Recursive partitioning: Repeatedly split the records into two parts so as to achieve maximum homogeneity within the new parts Pruning the tree: Simplify the tree by pruning peripheral branches to avoid overfitting
Recursive Partitioning

Recursive Partitioning Steps Pick one of the predictor variables, x i Pick a value of x i, say s i, that divides the training data into two (not necessarily equal) portions Measure how “pure” or homogeneous each of the resulting portions are “Pure” = containing records of mostly one class Idea is to pick x i, and s i to maximize purity Repeat the process
Example: Riding Mowers Data: 24 households classified as owning or not owning riding mowers Predictors = Income, Lot Size

Income Lot_Size Ownership 60.0 18.4 owner 85.5 16.8 owner 64.8 21.6 owner 61.5 20.8 owner 87.0 23.6 owner 110.1 19.2 owner 108.0 17.6 owner 82.8 22.4 owner 69.0 20.0 owner 93.0 20.8 owner 51.0 22.0 owner 81.0 20.0 owner 75.0 19.6 non-owner 52.8 20.8 non-owner 64.8 17.2 non-owner 43.2 20.4 non-owner 84.0 17.6 non-owner 49.2 17.6 non-owner 59.4 16.0 non-owner 66.0 18.4 non-owner 47.4 16.4 non-owner 33.0 18.8 non-owner 51.0 14.0 non-owner 63.0 14.8 non-owner
How to split

