This preview shows page 1. Sign up to view the full content.
Unformatted text preview: , that is, the impurity function of node t weighted by the estimated proportion of data that go to node t. The impurity of tree T , I (T ) is defined by I (T ) = I (t) = i(t)p(t) .
~ tT ~ tT Note for any node t the following equations hold: p(tL ) + p(tR ) = p(t) pL = p(tL )/p(t), pL + pR = 1 pR = p(tR )/p(t) Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) Define I (s, t) = I (t) - I (tL ) - I (tR ) = p(t)i(t) - p(tL )i(tL ) - p(tR )i(tR ) = p(t)(i(t) - pL i(tL ) - pR i(tR )) = p(t)i(s, t) Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) Possible impurity function: Gini index seems to work best in practice for many problems. The twoing rule: At a node t, choose the split s that maximizes 2 pL pR |p(j | tL ) - p(j | tR )| . 4
j K 1 1. Entropy: j=1 pj log pj . If pj = 0, use the limit limpj 0 pj log pj = 0. 2. Misclassification rate: 1 - maxj pj . K K 3. Gini index: j=1 pj (1 - pj ) = 1 - j=1 pj2 . Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) Estimate the posterior probabilities of classes in each node The total number of samples is N and the number of samples in class j, 1 j K , is Nj . The number of samples going to node t is N(t); the number of samples with class j going to node t is Nj (t). K j=1 Nj (t) = N(t). Nj (tL ) + Nj (tR ) = Nj (t). For a full tree (balanced), the sum of N(t) over all the t's at the same level is N. Jia Li http://www.stat.psu.edu/jiali Classification/Decis...
View Full Document
This note was uploaded on 02/04/2012 for the course STAT 557 taught by Professor Jiali during the Fall '09 term at Pennsylvania State University, University Park.
- Fall '09