This preview shows page 1. Sign up to view the full content.
Unformatted text preview: s leaf will output the class of the data it contains Several data points have exactly the same attributes even though they are not from the same class. We cannot split any further. We still declare the node to be a leaf, but it will output the class that is the majority of the classes in the node (in this example, ‘B’) 39 Decision Tree Algorithm (Discrete Attributes)
• LearnTree(X,Y) – Input: • Set X of R training vectors, each containing the values (x1,..,xM) of M attributes (X1,..,XM) • A vector Y of R elements, where yj = class of the jth datapoint – If all the datapoints in X have the same class value y • Return a leaf node that predicts y as output – If all the datapoints in X have the same attribute value (x1,..,xM) • Return a leaf node that predicts the majority of the class values in Y as output – Try all the possible attributes Xj and choose the one, j*, for which IG(YXj) is maximum – For every possible value v of Xj*: – Xv,Yv= set of datapoints for which xj* = v and corresponding classes – Childv Å LearnTree(Xv,Yv)
40 Decision Tree Algorithm (Continuous Attributes)
• LearnTree(X,Y) – Input: • Set X of R training vectors, each containing the values (x1,..,xM) of M attributes (X1,..,XM) • A vector Y of R elements, where yj = class of the jth datapoint – If all the datapoints in X have the same class value y • Return a leaf node that predicts y as output – If all the datapoints in X have the same attribute value (x1,..,xM) • Return a leaf node that predicts the majority of the class values in Y as output – Try all the possible attributes Xj and threshold t and choose the one, j*, for which IG(YXj,t) is maximum – XL,YL= set of datapoints for which xj* < t and corresponding classes – XH,YH = set of datapoints for which xj* >= t and corresponding classes – Left Child Å LearnTree(XL,YL) – Right Child Å LearnTree(XH,YH) Expressiveness of Decision Trees
Can represent any Boolean function. Can be rewritten as rules in Disjunctive Normal Form (DNF) 41 42 Decision Trees So Far
• Given R observations from training data, each with M attributes X and a class attribute Y, construct a sequence of tests (decision tree) to predict the class attribute Y from the attributes X • Basic strategy for defining the tests (“when to split”) Æ maximize the information gain on the training data set at each node of the tree • Problem (next): – Evaluating the tree on training data is dangerous Æ overfitting The Overfitting Problem (Example) • Suppose that, in an ideal world, class B is everything such that X2 >= 0.5 and class A is everything with X2 < 0.5 • Note that attribute X1 is irrelevant • Seems like generating a decision tree would be trivial
43 44 The Overfitting Problem (Example) The Overfit...
View
Full
Document
This note was uploaded on 11/03/2010 for the course UNIVERSITY CS6375 taught by Professor Vicentng during the Fall '10 term at University of Texas at Dallas, Richardson.
 Fall '10
 VicentNg

Click to edit the document details