Stats 202 - Lecture 7

# Factortest61 xtest test160

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: test) length 18 In class exercise #26: Repeat the previous exercise for a tree of depth 1 by using control=rpart.control(maxdepth=1). Which model seems better? 19 In class exercise #26: Repeat the previous exercise for a tree of depth 1 by using control=rpart.control(maxdepth=1). Which model seems better? Solution: fit<rpart(y~.,x,control=rpart.control(maxdepth=1)) sum(y!=predict(fit,x,type="class"))/length(y) sum(y_test!=predict(fit,x_test,type="class"))/ length(y_test) 20 In class exercise #27: Repeat the previous exercise for a tree of depth 6 by using control=rpart.control(minsplit=0,minbucket=0, cp=-1,maxcompete=0, maxsurrogate=0, usesurrogate=0, xval=0,maxdepth=6) Which model seems better? 21 In class exercise #27: Repeat the previous exercise for a tree of depth 6 by using control=rpart.control(minsplit=0,minbucket=0, cp=-1,maxcompete=0, maxsurrogate=0, usesurrogate=0, xval=0,maxdepth=6) Which model seems better? Solution: fit<-rpart(y~.,x, control=rpart.control(minsplit=0, minbucket=0,cp=-1,maxcompete=0, maxsurrogate=0, usesurrogate=0, xval=0,maxdepth=6)) sum(y!=predict(fit,x,type="class"))/length(y) sum(y_test!=predict(fit,x_test,type="class"))/ length(y_test) 22 How are Decision Trees Generated? Many algorithms use a version of a “top-down” or “divide-and-conquer” approach known as Hunt’s Algorithm (Page 152): Let Dt be the set of training records that reach a node t –If Dt contains records that belong the same class y t, then t is a leaf node labeled as yt –If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset. 23 An Example of Hunt ’s Algorithm Tid Refund Marital Status 1 Yes Yes No Don’t Cheat Don’t Cheat Marital Status Single, Divorced Cheat Married Single, Divorced No Single 70K No Yes Married 120K No No Divorced 95K Yes No Married No Yes Divorced 220K No No Single 85K Yes No Married 75K No 10 No Single 90K Yes 60K 10 Married Don’t Cheat Taxable Income Don’t Cheat No 9 Marital Status 100K 8 No Married 7 Refund No 6 Refund No 4 Don’t Cheat Don’t Cheat 125K 3 No Single 5 Yes Yes 2 Refund Taxable Income Cheat < 80K >= 80K Don’t Cheat Cheat 24 How to Apply Hunt ’s Algorithm Usually it is done in a “greedy” fashion. “Greedy” means that the optimal split is chosen at each stage according t...
View Full Document

## This note was uploaded on 02/03/2014 for the course STATS 202 taught by Professor Taylor during the Fall '09 term at Stanford.

Ask a homework question - tutors are online