# Possible criterion figure out that splitting this

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ting Problem: Example • However, we collect training examples from the perfect world through some imperfect observation device • As a result, the training data is corrupted by noise. 45 • Because of the noise, the resulting decision tree is far more complicated than it should be • This is because the learning algorithm tries to classify all of the training set perfectly This is a fundamental problem in learning: overfitting 46 The Overfitting Problem: Example The Overfitting Problem: Example It would be nice to identify automatically that splitting this node is stupid. Possible criterion: figure out that splitting this node will lead to a “complicated” tree suggesting noisy data • The effect of overfitting is that the tree is guaranteed to classify the training data perfectly, but it may do a terrible job at classifying new test data. • Example: (0.6,0.9) is classified as ‘A’ 47 • The effect of overfitting is that the tree is guaranteed to classify the training data perfectly, but it may do a terrible job at classifying new test data. • Example: (0.6,0.9) is classified as ‘A’ 48 The Overfitting Problem: Example Note that, even though the attribute X1 is completely irrelevant in the original distribution, it is used to make the decision at that node Possible Overfitting Solution • Grow tree based on training data (unpruned tree) • Prune the tree by removing useless nodes based on additional test data (also known as validation data) not used for training • The effect of overfitting is that the tree is guaranteed to classify the training data perfectly, but it may do a terrible job at classifying new test data. • Example: (0.6,0.9) is classified as ‘A’ 49 50 Training data Unpruned decision tree from training data Training data with the partitions induced by the decision tree (Notice the tiny regions at the top necessary to correctly classify the ‘A’ outliers!) 51 Unpruned decision tree from training data 52 Unpruned decision tree from training data Performance (% correctly classified) Training: 100% Test: 77.5% 53 Pruned decision tree from training data Performance (% correctly classified) Training: 95% Test: 80% 54 Pruned decision tree from training data Performance (% correctly classified) Training: 80% Test: 97.5% 55 56 Locating the Overfitting Point Decision Tree Pruning • Construct the entire tree as before • Starting at the leaves, recursively eliminate splits: – Evaluate performance of the tree on additional test data (also known as validation data) – Prune the tree if the classification performance increases by removing the split • General principle: As the complexity of the classifier increases (depth of the decision tree), the performance on the training data increases and the performance on the test data decreases when the classifier overfits the tr...
View Full Document

## This note was uploaded on 11/03/2010 for the course UNIVERSITY CS6375 taught by Professor Vicentng during the Fall '10 term at University of Texas at Dallas, Richardson.

Ask a homework question - tutors are online