Rpart_TechReport61

# E ect of alternate loss matrices is to change the

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: e default tree is Node number 1: 81 observations, complexity param=0.1765 predicted class= absent expected loss= 0.2099 class counts: 64 17 probabilities: 0.7901 0.2099 left son=2 62 obs right son=3 19 obs Primary splits: Start 8.5 to the right, improve=6.762, 0 missing Number 5.5 to the left, improve=2.867, 0 missing Age 39.5 to the left, improve=2.250, 0 missing Surrogate splits: Number 6.5 to the left, agree=0.8025, 0 split The t using the prior 0.65,0.35 is Node number 1: 81 observations, complexity param=0.302 predicted class= absent expected loss= 0.35 class counts: 64 17 probabilities: 0.65 0.35 left son=2 46 obs right son=3 35 obs Primary splits: Start 12.5 to the right, improve=10.900, 0 missing 27 Number 4.5 to the left, Age 39.5 to the left, Surrogate splits: Number 3.5 to the left, improve= 5.087, 0 missing improve= 4.635, 0 missing agree=0.6667, 0 split And rst split under 4 3 losses is Node number 1: 81 observations, complexity param=0.01961 predicted class= absent expected loss= 0.6296 class counts: 64 17 probabilities: 0.7901 0.2099 left son=2 62 obs right son=3 19 obs Primary splits: Start 8.5 to the right, improve=5.077, 0 missing Number 5.5 to the left, improve=2.165, 0 missing Age 39.5 to the left, improve=1.535, 0 missing Surrogate splits: Number 6.5 to the left, agree=0.8025, 0 split 7 Regression 7.1 De nition Up to this point the classi cation problem has been used to de ne and motivate our formulae. However, the partitioning procedure is quite general and can be extended by specifying 5 ingredients": A splitting criterion, which is used to decide which variable gives the best split. For classi cation this was either the Gini or log-likelihood function. In the anova method the splitting criteria is SST , SSL + SSR , where SST = P yi , y2 is the sum of squares for the node, and SSR , SSL are the sums of  squares for the right and left son, respectively. This is equivalent to chosing the split to maximize the between-groups sum-of-squares in a simple analysis of variance. This rule is identical to the regression option for tree. A summary statistic or vector, which is used to describe a node. The rst element of the vector is considered to be the tted value. For the anova method this is the mean of the node; for classi cation the response is the predicted class followed by the vector of class probabilities. The error of a node. This will be the variance of y for anova, and the predicted loss for classi cation. 28 The prediction error for a new observation, assigned to the node. For anova this is ynew , y.  Any necessary initialization. The anova method leads to regression trees; it is the default method if y a simple numeric vector, i.e., not a factor, matrix, or survival object. 7.2 Example: Consumer Report Auto data cont. The dataset car.all contains a collection of variables from the April, 1990 Consumer Reports; it has 36 variables on 111 cars. Documentation may be found in th...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online