Lecture7

Lecture7 - CS221 Lecture notes Decision trees Last time we...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS221 Lecture notes Decision trees Last time, we discussed two supervised learning algorithms: linear regression and logistic regression. These algorithms worked well when our inputs x were continuous. Now we will discuss another classification algorithm, called decision trees, which is more appropriate for discrete inputs. For instance, suppose we are trying to predict if our roommate will eat a certain food. Our features are as follows: 1. HUNGER: starving, hungry, can-eat, or full 2. LIKE: yes/no 3. HEALTHY: yes/no 4. PRICE: free, cheap, expensive Our target variable is y ∈ { , 1 } , which evaluates to 1 if our roommate eats the food. How would we solve this problem using logistic regression? We might define a feature vector which has a { , 1 } entry for each possible value of each feature. (In other words, if there are four features, each taking on three possible values, our inputs will be of length 12.) If a feature i takes the value v i , we assign a 1 to the corresponding element of the input x ; otherwise, we assign it a 0. For an example where x ( i ) = (humgry, yes, yes, cheap) , our 11-dimensional feature vector might be x = [0 1 0 0 1 0 1 0 0 1 0] . 1 2 Figure 1: An example decision tree for deciding whether our roommate will eat a given food. You can imagine this process producing enormous feature vectors if we use it in domains with large numbers of variables. 1 Instead, we can use a decision tree classifier. A decision tree for this domain might look like the one in Figure 1. Suppose we’re given the same ex- ample x ( i ) as above. We begin at the root node , which is labeled “Hunger.” Because our roommate is hungry, we descend down the branch labeled “hun- gry” to get to the node labeled “Like.” Since our roommate likes the food, we descend down the “Y” branch. Now we’ve arrived at a leaf node , which happens to give the answer “Yes.” Hence, we conclude that our roommate will eat the food. In general, the internal nodes of the tree will correspond to features, the edges will correspond to different values of the feature, and the leaves will correspond to yes/no predictions. Or, rather than a simple yes or no, we might associate with each leaf ℓ a probability p ℓ , which is the probability that our roommate will eat the food in a situation associated with ℓ . 1 You may have observed that we can do slightly better by using only a single component of the inputs to represent binary features, but this won’t solve the basic problem of large feature vectors. 3 1 Decision tree learning Now, we discuss how to learn a decision tree from data. Suppose we take careful notes on our roommate’s eating habits and come up with the training data shown in Table 1. We define a scoring function ℓ as follows....
View Full Document

This note was uploaded on 11/30/2009 for the course CS 221 taught by Professor Koller,ng during the Winter '09 term at Stanford.

Page1 / 8

Lecture7 - CS221 Lecture notes Decision trees Last time we...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online