This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS221 Lecture notes Decision trees Last time, we discussed two supervised learning algorithms: linear regression and logistic regression. These algorithms worked well when our inputs x were continuous. Now we will discuss another classification algorithm, called decision trees, which is more appropriate for discrete inputs. For instance, suppose we are trying to predict if our roommate will eat a certain food. Our features are as follows: 1. HUNGER: starving, hungry, caneat, or full 2. LIKE: yes/no 3. HEALTHY: yes/no 4. PRICE: free, cheap, expensive Our target variable is y { , 1 } , which evaluates to 1 if our roommate eats the food. How would we solve this problem using logistic regression? We might define a feature vector which has a { , 1 } entry for each possible value of each feature. (In other words, if there are four features, each taking on three possible values, our inputs will be of length 12.) If a feature i takes the value v i , we assign a 1 to the corresponding element of the input x ; otherwise, we assign it a 0. For an example where x ( i ) = (humgry, yes, yes, cheap) , our 11dimensional feature vector might be x = [0 1 0 0 1 0 1 0 0 1 0] . 1 2 Figure 1: An example decision tree for deciding whether our roommate will eat a given food. You can imagine this process producing enormous feature vectors if we use it in domains with large numbers of variables. 1 Instead, we can use a decision tree classifier. A decision tree for this domain might look like the one in Figure 1. Suppose were given the same ex ample x ( i ) as above. We begin at the root node , which is labeled Hunger. Because our roommate is hungry, we descend down the branch labeled hun gry to get to the node labeled Like. Since our roommate likes the food, we descend down the Y branch. Now weve arrived at a leaf node , which happens to give the answer Yes. Hence, we conclude that our roommate will eat the food. In general, the internal nodes of the tree will correspond to features, the edges will correspond to different values of the feature, and the leaves will correspond to yes/no predictions. Or, rather than a simple yes or no, we might associate with each leaf a probability p , which is the probability that our roommate will eat the food in a situation associated with . 1 You may have observed that we can do slightly better by using only a single component of the inputs to represent binary features, but this wont solve the basic problem of large feature vectors. 3 1 Decision tree learning Now, we discuss how to learn a decision tree from data. Suppose we take careful notes on our roommates eating habits and come up with the training data shown in Table 1. We define a scoring function as follows....
View Full
Document
 Winter '09
 KOLLER,NG
 Artificial Intelligence, Algorithms

Click to edit the document details