This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Decision Tree Example
• Three variables: Machine Learning
CS6375  Fall 2010 Decision Tree Learning
a – Attribute 1: Hair = {blond, dark} – Attribute 2: Height = {tall, short} – Class: Country = {Gromland, Polvia} Reading: Sections 18.218.3, R&N Sections 3.13.4, Mitchell 1 2 Decision Trees
Decision Trees are classifiers for instances represented as features vectors. Nodes are (equality and inequality) tests for feature values There is one branch for each value of the feature Leaves specify the categories (labels) Can categorize instances into multiple disjoint categories The class of a new input can be classified by following the tree all the way down to a leaf and by reporting the output of the leaf. For example: (B,T) is classified as (D,S) is classified as
3 4 General Case (Discrete Attributes)
• We have R observations from training data
– Each observation has M attributes X1,..,XM – Each Xi can take N distinct discrete values – Each observation has a class attribute Y with C distinct (discrete) values – Problem: Construct a sequence of tests on the attributes such that, given a new input (x’1,..,x’M), the class attribute y is correctly predicted General Decision Tree (Discrete Attributes) X = attributes of training data (RxM) Y = Class of training data (R) 5 6 Decision Tree Example The class of a new input can be classified by following the tree all the way down to a leaf and by reporting the output of the leaf. For example: (0.2,0.8) is classified as (0.8,0.2) is classified as
7 8 General Case (Continuous Attributes)
• We have R observations from training data
– Each observation has M attributes X1,..,XM – Each Xi can take N continuous values – Each observation has a class attribute Y with C distinct (discrete) values – Problem: Construct a sequence of tests of the form Xi < ti ? on the attributes such that, given a new input (x’1,..,x’M), the class attribute y is correctly predicted General Decision Tree (Continuous Attributes) X = attributes of training data (RxM) Y = Class of training data (R) 9 10 Basic Questions
• How to choose the attribute/value to split on at each level of the tree? • When to stop splitting? When should a node be declared a leaf? • If a leaf node is impure, how should the class label be assigned? How to choose the attribute/value to split on at each level of the tree? • Two classes (red circles/green crosses) • Two attributes: X1 and X2 • 11 points in training data • Goal: Construct a decision tree such that the leaf nodes predict correctly the class for all the training examples
11 12 How to choose the attribute/value to split on at each level of the tree? This node is “pure” because there is only one class left Æ No ambiguity in the class label
13 This node is almost “pure” Æ Little ambiguity in the class label These nodes contain a mixture of classes Do not disambiguate between the classes 14 Entropy
We want to find the most compact, smallest size tree (Occam’s razor), that classifies the training data correctly We want to find the split choices that will get us the fastest to pure nodes • Entropy is a measure of the impurity of a distribution, defined as: This node is “pure” because there is only...
View Full
Document
 Fall '10
 VicentNg

Click to edit the document details