Unformatted text preview: is some partition of the C classes into two disjoint sets. If C = 2 twoing
is equivalent to the usual impurity index for f . Surprisingly, twoing can be calculated
almost as e ciently as the usual impurity index. One potential advantage of twoing
6 1.0
0.6
0.4 Scaled Impurity 0.8 0.6
0.4
0.2 Impurity Gini criteria
Information 0.0 0.0 0.2 Gini criteria
Information 0.0 0.2 0.4 0.6 0.8 1.0 0.0 P 0.2 0.4 0.6 0.8 1.0 P Figure 2: Comparison of Gini and Information indices
is that the output may give the user additional insight concerning the structure of
the data. It can be viewed as the partition of C into two superclasses which are in
some sense the most dissimilar for those observations in A. For certain problems
there may be a natural ordering of the response categories e.g. level of education,
in which case ordered twoing can be naturally de ned, by restricting C1 to be an
interval 1; 2; : : : ; k of classes. Twoing is not part of rpart. 3.2 Incorporating losses One salutatory aspect of the risk reduction criteria not found in the impurity measures is inclusion of the loss function. Two di erent ways of extending the impurity
criteria to also include losses are implemented in CART, the generalized Gini index
and altered priors. The rpart software implements only the altered priors method. 3.2.1 Generalized Gini index
The Gini index has the following interesting interpretation. Suppose an object is
selected at random from one of C classes according to the probabilities p1 ; p2 ; :::; pC
and is randomly assigned to a class using the same distribution. The probability of 7 misclassi cation is
XX i j 6=i pipj = XX i j pipj , X i p2 =
i X i 1 , p2 = Gini index for p
i Let Li; j be the loss of assigning class j to an object which actually belongs to class
i. The expected cost of misclassi cation is P P Li; j pi pj . This suggests de ning
a generalized Gini index of impurity by Gp = XX i j Li; j pi pj The corresponding splitting criterion appears to be promising for applications
involving variable misclassi cation costs. But there are several reasonable objections
to it. First, Gp is not necessarily a concave function of p, which was the motivating
factor behind impurity measures. More seriously, G symmetrizes the loss matrix
before using it. To see this note that Gp = 1=2 XX Li; j + Lj; i pi pj In particular, for twoclass problems, G in e ect ignores the loss matrix. 3.2.2 Altered priors Remember the de nition of RA RA = C
X
i=1
C
X
i=1 piALi; A
i Li; AniA =ni n=nA ~
Assume there exists and L be such that
~
iLi; j = i Li; j
~~ 8i; j 2 C ~
Then RA is unchanged under the new losses and priors. If L is proportional to
the zeroone loss matrix then the priors should be used in the splitting criteria.
~
This is possible only if L is of the form Li; j = 8 Li i 6= j
0 i=j in which case L
i = Pi iL
~ jjj This is always possible when C = 2, and hence altered priors are exact for the two
class prob...
View
Full Document
 Fall '13
 Regression Analysis, Missing values

Click to edit the document details