Lecture 23: Decision Trees Prof. Julia Hockenmaier [email protected] http://cs.illinois.edu/fa11/cs440 CS440/ECE448: Intro to Artificial Intelligence

Decision trees
Decision trees 3 CS440/ECE448: Intro AI drink? milk? milk? coffee tea yes no no sugar sugar yes no sugar no sugar

Decision tree learning Training data D = {( x 1 , y 1 ),…, ( x N , y N )} each x i = ( x 1 i ,…., x d i ) is a d -dimensional feature vector each y i is the target label (class) of the i-th data point Training algorithm: Initial tree = the root, corresponding to all items in D A node is a leaf if all its data items have the same y At each non-leaf node: find the feature x i with the highest information gain, create a new child for each value of x i , distribute the items accordingly. 4 CS440/ECE448: Intro AI
Information Gain How much information are we gaining by splitting node S on attribute A with values V(A) ? Information required before the split: H(S parent ) Information required after the split: i V(A) P(S child_i )H(S child_i ) Gain ( S parent , A ) = H ( S parent ) ! H ( S child i ) i " V ( A ) N # S child i S parent

Dealing with numerical attributes Many attributes are not boolean (0,1) or nominal (classes) Number of times a word appears in a text RGB values of a pixel height, weight, ….
