L and right by tR . The collection of all the nodes is denoted by T ; and the ~ collection of all the leaf nodes by T . A split is denoted by s. The set of splits is denoted by S. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) The Three Elements The construction of a tree involves the following three elements:
1. The selection of the splits. 2. The decisions when to declare a node terminal or to continue splitting it. 3. The assignment of each terminal node to a class. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) In particular, we need to decide the following:
1. A set Q of binary questions of the form {Is X A?}, A X . 2. A goodness of split criterion (s, t) that can be evaluated for any split s of any node t. 3. A stopsplitting rule. 4. A rule for assigning every terminal node to a class. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) Standard Set of Questions The input vector X = (X1 , X2 , ..., Xp ) contains features of both categorical and ordered types. Each split depends on the value of only a unique variable. For each ordered variable Xj , Q includes all questions of the form {Is Xj c?} for all realvalued c. Since the training data set is finite, there are only finitely many distinct splits that can be generated by the question {Is Xj c?}. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Tr...
