# jia li httpwwwstatpsuedujiali classificationdecision

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ees (I) If Xj is categorical, taking values, say in{1, 2, ..., M}, then Q contains all questions of the form {Is Xj A?} . A ranges over all subsets of {1, 2, ..., M}. The splits for all p variables constitute the standard set of questions. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) Goodness of Split The goodness of split is measured by an impurity function defined for each node. Intuitively, we want each leaf node to be &quot;pure&quot;, that is, one class dominates. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) The Impurity Function Definition: An impurity function is a function defined on the set of all K -tuples of numbers (p1 , ..., pK ) satisfying pj 0, j = 1, ..., K , j pj = 1 with the properties: 1 1 1 1. is a maximum only at the point ( K , K , ..., K ). 2. achieves its minimum only at the points (1, 0, ..., 0), (0, 1, 0, ..., 0), ..., (0, 0, ..., 0, 1). 3. is a symmetric function of p1 , ..., pK , i.e., if you permute pj , remains constant. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) Definition: Given an impurity function , define the impurity measure i(t) of a node t as i(t) = (p(1 | t), p(2 | t), ..., p(K | t)) , where p(j | t) is the estimated probability of class j within node t. Goodness of a split s for node t, denoted by (s, t), is defined by (s, t) = i(s, t) = i(t) - pR i(tR ) - pL i(tL ) , where pR and pL are the proportions of the samples in node t that go to the right node tR and the left node tL respectively. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) Define I (t) = i(t)p(t)...
View Full Document

Ask a homework question - tutors are online