Repeat the previous exercise for a tree of depth 1
by using control=rpart.control(maxdepth=1).
Solution:
fit<rpart(y~.,x,control=rpart.control(maxdepth=1)) sum(y!=predict(fit,x,type="class"))/length(y)
sum(y_test!=predict(fit,x_test,type="class"))/
Repeat the previous exercise for a tree of depth 6
by using
control=rpart.control(minsplit=0,minbucket=0,
cp=1,maxcompete=0, maxsurrogate=0,
Solution:
fit<rpart(y~.,x,
control=rpart.control(minsplit=0,
minbucket=0,cp=1,maxcompete=0,
maxsurrogate=0, usesurrogate=0,
xval=0,maxdepth=6))
sum(y!=predict(fit,x,type="class"))/length(y)
sum(y_test!=predict(fit,x_test,type="class"))/
length(y_test) 22 How are Decision Trees Generated?
Many algorithms use a version of a “topdown” or
“divideandconquer” approach known as Hunt’s
Algorithm (Page 152):
Let Dt be the set of training records that reach a node t
–If Dt contains records that belong the same class y t,
then t is a leaf node labeled as yt
–If Dt contains records that belong to more than one
class, use an attribute test to split the data into
smaller subsets. Recursively apply the procedure to
each subset. 23 An Example of Hunt ’s Algorithm
Tid Refund Marital
Status
1 Yes Yes No Don’t
Cheat Don’t
Cheat Marital
Status Single,
Divorced Cheat Married Single,
Divorced No Single 70K No Yes Married 120K No No Divorced 95K Yes No Married No Yes Divorced 220K No No Single 85K Yes No Married 75K No 10 No Single 90K Yes 60K 10 Married
Don’t
Cheat Taxable
Income Don’t
Cheat No 9 Marital
Status 100K 8 No Married 7 Refund No 6 Refund No 4 Don’t
Cheat Don’t
Cheat 125K 3 No Single 5 Yes Yes 2 Refund Taxable
Income Cheat < 80K >= 80K Don’t
Cheat Cheat 24 How to Apply Hunt ’s Algorithm Usually it is done in a “greedy” fashion.
“Greedy” means that the optimal split is chosen at
each stage according t...
