26 8 21 probabilities 02118 01412 03059 00941 02471

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: as RLLRLL, improve= 4.288, 0 missing Price 11970 to the right, improve= 3.200, 0 missing Mileage 24.5 to the left, improve= 2.476, 36 missing Node number 2: 58 observations, complexity param=0.08475 predicted class= average expected loss= 0.6034 class counts: 18 12 23 5 0 probabilities: 0.3103 0.2069 0.3966 0.0862 0.0000 left son=4 9 obs right son=5 49 obs Primary splits: Type splits as RRRRLR, improve=3.187, 0 missing Price 11230 to the left, improve=2.564, 0 missing Mileage 24.5 to the left, improve=1.802, 30 missing Country splits as ---L--RLRL, improve=1.329, 0 missing The t for the information splitting rule is Node number 1: 85 observations, complexity param=0.3051 predicted class= average expected loss= 0.6941 class counts: 18 12 26 8 21 probabilities: 0.2118 0.1412 0.3059 0.0941 0.2471 left son=2 58 obs right son=3 27 obs Primary splits: Country splits as ---LRRLLLL, improve=38.540, 0 missing Type splits as RLLRLL, improve=11.330, 0 missing Price 11970 to the right, improve= 6.241, 0 missing Mileage 24.5 to the left, improve= 5.548, 36 missing Node number 2: 58 observations, complexity param=0.0678 predicted class= average expected loss= 0.6034 class counts: 18 12 23 5 0 24 probabilities: 0.3103 0.2069 0.3966 0.0862 0.0000 left son=4 36 obs right son=5 22 obs Primary splits: Type splits as RLLRLL, improve=9.281, 0 missing Price 11230 to the left, improve=5.609, 0 missing Mileage 24.5 to the left, improve=5.594, 30 missing Country splits as ---L--RRRL, improve=2.891, 0 missing Surrogate splits: Price 10970 to the right, agree=0.8793, 0 split Country splits as ---R--RRRL, agree=0.7931, 0 split The rst 3 countries Brazil, England, France had only one or two cars in the listing, all of which were missing the reliability variable. There are no entries for these countries in the rst node, leading to the , symbol for the rule. The information measure has larger improvements", consistent with the di erence in scaling between the information and Gini criteria shown in gure 2, but the relative merits of di erent splits are fairly stable. The two rules do not choose the same primary split at node 2. The data at this point are Compact Large Medium Small Sporty Van Much worse 2 2 4 2 7 1 worse 5 0 4 3 0 0 average 3 5 8 2 2 3 better 2 0 0 3 0 0 Much better 0 0 0 0 0 0 Since there are 6 di erent categories, all 25 = 32 di erent combinations were explored, and as it turns out there are several with a nearly identical improvement. The Gini and information criteria make di erent random" choices from this set of near ties. For the Gini index, Sporty vs others Compact Small vs others have improvements of 37.19 and 37.20, respectively. For the information index, the improvements are 67.3 versus 64.2, respectively. Interestingly, the two splitting criteria arrive at exactly the same nal nodes, for the full tree, although by di erent paths. Compare the class counts of the terminal nodes. We have said that...
View Full Document

Ask a homework question - tutors are online