As iid bernoulli p fxi 1g 5 and are independent of y

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: e of size 200 was generated accordingly and the procedure applied using the gini index see 3.2.1 to build the tree. The S-plus code to compute the simulated data and the t are shown below. n - 200 temp - c1,1,1,0,1,1,1, 0,0,1,0,0,1,0, 1,0,1,1,1,0,1, 1,0,1,1,0,1,1, 0,1,1,1,0,1,0, 1,1,0,1,0,1,1, 0,1,0,1,1,1,1, 1,0,1,0,0,1,0, 1,1,1,1,1,1,1, 1,1,1,1,0,1,0 lights - matrixtemp, 10, 7, byrow=T  The true light pattern 0-9 temp1 - matrixrbinomn*7, 1, .9, n, 7  Noisy lights temp1 - ifelselights y+1, ==1, temp1, 1-temp1 temp2 - matrixrbinomn*17, 1, .5, n, 17 Random lights x - cbindtemp1, temp2 x is the matrix of predictors 14 x.7>0.5 | x.3>0.5 x.4<0.5 x.6<0.5 x.1<0.5 x.1>0.5 x.4<0.5 5 2 6 1 x.1<0.5 4 7 9 x.5<0.5 0 3 8 Figure 4: Optimally pruned tree for the stochastic digit recognition data y - rep0:9, length=200 The particular data set of this example can be replicated by setting .Random.seed to c21, 14, 49, 32, 43, 1, 32, 22, 36, 23, 28, 3 before the call to rbinom. Now we t the model: temp3 - rpart.controlxval=10, minbucket=2, minsplit=4, cp=0 dfit - rparty x, method='class', control=temp3 printcpdfit  Classification tree: rpartformula = y x, method = "class", control = temp3  Variables actually used in tree construction: 1 x.1 x.10 x.12 x.13 x.15 x.19 x.2 x.20 x.22 x.3 Root node error: 180 200 = 0.9 1 2 3 4 CP nsplit rel error 0.1055556 0 1.00000 0.0888889 2 0.79444 0.0777778 3 0.70556 0.0666667 5 0.55556 xerror 1.09444 1.01667 0.90556 0.75000 15 xstd 0.0095501 0.0219110 0.0305075 0.0367990 x.4 x.5 x.6 x.7 x.8 5 6 7 8 9 10 11 12 0.0555556 0.0166667 0.0111111 0.0083333 0.0055556 0.0027778 0.0013889 0.0000000 8 9 11 12 16 27 31 35 0.36111 0.30556 0.27222 0.26111 0.22778 0.16667 0.15556 0.15000 0.56111 0.36111 0.37778 0.36111 0.35556 0.34444 0.36667 0.36667 0.0392817 0.0367990 0.0372181 0.0367990 0.0366498 0.0363369 0.0369434 0.0369434 fit9 - prunedfit, cp=.02 plotfit9, branch=.3, compress=T textfit9 The cp table di ers from that in section 3.5 of 1 in several ways, the last two of which are somewhat important. The actual values are di erent, of course, because of di erent random number generators in the two runs. The table is printed from the smallest tree no splits to the largest one 35 splits. We nd it easier to compare one tree to another when they start at the same place. The number of splits is listed, rather than the number of nodes. The number of nodes is always 1 + the number of splits. For easier reading, the error columns have been scaled so that the rst node has an error of 1. Since in this example the model with no splits must make 180 200 misclassi cations, multiply columns 3-5 by 180 to get a result in terms of absolute error. Computations are done on the absolute error scale, and printed on relative scale. The complexity parameter column cp has been similarly scaled. Looking at the cp table, we see that the best tree has 10 terminal nodes 9 splits, based on cros...
View Full Document

This document was uploaded on 09/26/2013.

Ask a homework question - tutors are online