Its left child node is denoted by tl and right by tr

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: L and right by tR . The collection of all the nodes is denoted by T ; and the ~ collection of all the leaf nodes by T . A split is denoted by s. The set of splits is denoted by S. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) The Three Elements The construction of a tree involves the following three elements: 1. The selection of the splits. 2. The decisions when to declare a node terminal or to continue splitting it. 3. The assignment of each terminal node to a class. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) In particular, we need to decide the following: 1. A set Q of binary questions of the form {Is X A?}, A X . 2. A goodness of split criterion (s, t) that can be evaluated for any split s of any node t. 3. A stop-splitting rule. 4. A rule for assigning every terminal node to a class. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Trees (I) Standard Set of Questions The input vector X = (X1 , X2 , ..., Xp ) contains features of both categorical and ordered types. Each split depends on the value of only a unique variable. For each ordered variable Xj , Q includes all questions of the form {Is Xj c?} for all real-valued c. Since the training data set is finite, there are only finitely many distinct splits that can be generated by the question {Is Xj c?}. Jia Li http://www.stat.psu.edu/jiali Classification/Decision Tr...
View Full Document

Ask a homework question - tutors are online