Or t2 is a subtree of t1 hence either jt1 j jt2 j or

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: numbers 1 ; 2 ; : : : ; m ; both T 1 ; : : : ; T m and RT 1 , : : :, RT m  can be computed e ciently. Using the rst result, we can uniquely de ne T as the smallest tree T for which R T  is minimized. Since any sequence of nested trees based on T has at most jT j members, result 2 implies that all possible values of can be grouped into m intervals, m  jT j I1 = 0; 1 I2 =  1 ; 2 .. . Im =  m,1 ; 1 where all 2 Ii share the same minimizing subtree. 12 4.2 Cross-validation Cross-validation is used to choose a best value for by the following steps: 1. Fit the full model on the data set compute I1 ; I2 ; :::; Im set 1 = 0 p 2= 12 =p 2 3 3 .. . p m,2 m,1 m,1 = m=1 each i is a `typical value' for its Ii 2. Divide the data set into s groups G1 ; G2 ; :::; Gs each of size s=n, and for each group separately: t a full model on the data set `everyone except Gi ' and determine T 1 ; T 2 ; :::; T m for this reduced data set, compute the predicted class for each observation in Gi , under each of the models T j for 1  j  m, from this compute the risk for each subject. 3. Sum over the Gi to get an estimate of risk for each j . For that complexity parameter with smallest risk compute T for the full data set, this is chosen as the best trimmed tree. In actual practice, we may use instead the 1-SE rule. A plot of versus risk often has an initial sharp drop followed by a relatively at plateau and then a slow rise. The choice of among those models on the plateau can be essentially random. To avoid this, both an estimate of the risk and its standard error are computed during the cross-validation. Any risk within one standard error of the achieved minimum is marked as being equivalent to the minimum i.e. considered to be part of the at plateau. Then the simplest model, among all those tied" on the plateau, is chosen. In the usual de nition of cross-validation we would have taken s = n above, i.e., each of the Gi would contain exactly one observation, but for moderate n this is computationally prohibitive. A value of s = 10 has been found to be su cient, but users can vary this if they wish. In Monte-Carlo trials, this method of pruning has proven very reliable for screening out `pure noise' variables in the data set. 13 4.3 Example: The Stochastic Digit Recognition Problem This example is found in section 2.6 of 1 , and used as a running example throughout much of their book. Consider the segments of an unreliable digital readout 1 2 4 5 3 6 7 where each light is correct with probability 0.9, e.g., if the true digit is a 2, the lights 1, 3, 4, 5, and 7 are on with probability 0.9 and lights 2 and 6 are on with probability 0.1. Construct test data where Y 2 f0; 1; :::; 9g, each with proportion 1 10 and the Xi , i = 1; : : : ; 7 are i.i.d. Bernoulli variables with parameter depending on Y. X8 , X24 are generated as i.i.d bernoulli P fXi = 1g = :5, and are independent of Y. They correspond to imbedding the readout in a larger rectangle of random lights. A sampl...
View Full Document

This document was uploaded on 09/26/2013.

Ask a homework question - tutors are online