Overfitting-L5

Overfitting-L5 - CSE572:DataMining Lecture 7: Model...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CSE 572: Data Mining Lecture 7: Model Overfitting
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Classification Errors Training errors (apparent errors) Errors committed on the training set Test errors Errors committed on the test set Generalization errors Expected error of a model over random selection of records from the same distribution
Background image of page 2
3 Example Data Set Two class problem: +, o 3000 data points (30% for training, 70% for testing) Data set for + class is generated from a uniform distribution Data set for o class is generated from a mixture of 3 gaussian distributions, centered at (5,15), (10,5), and (15,15)
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 Decision Trees x1 < 13.29 x2 < 17.35 x2 < 12.63 x1 < 6.56 x2 < 8.64 x2 < 1.38 x1 < 2.15 x1 < 7.24 x1 < 12.11 x1 < 18.88 x1 < 13.29 x2 < 17.35 x1 < 6.56 x2 < 8.64 x2 < 1.38 x1 < 2.15 x1 < 7.24 x1 < 12.11 x1 < 18.88 x2 < 4.06 x1 < 6.99 x1 < 6.78 x2 < 19.93 x1 < 3.03 x2 < 12.68 x1 < 2.72 x2 < 15.77 x2 < 17.14 x2 < 12.89 x2 < 13.80 x2 < 16.75 x2 < 16.33 Decision Tree with 11 leaf nodes Decision Tree with 24 leaf nodes Which tree is better?
Background image of page 4
5 Model Overfitting Underfitting: when model is too simple, both training and test errors are large Overfitting: when model is too complex, training error is small but test error is large
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 Mammal Classification Problem Body Temperature Give Birth Warm Cold Yes No Mammals Non- mammals Non- mammals Training Set Decision Tree Model training error = 0%
Background image of page 6
7 Effect of Noise  Training Set: Test Set: Example : Mammal Classification problem Body Temperature Give Birth Warm-blooded Cold-blooded Yes No Mammals Non- mammals Non- mammals Model M1: train err = 0%, test err = 30% Model M2: train err = 20%, test err = 10% Give Birth Four- legged Yes No Yes No Mammals Non- mammals Non- mammals Body Temperature Warm-blooded Cold-blooded Non- mammals
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8 Lack of  Representative Samples Body Temperature Hibernates Warm-blooded Cold-blooded Yes No Non- mammals Non- mammals Mammals Four- legged Yes No Non- mammals Lack of training records at the leaf nodes for making reliable classification Training Set: Test Set: Model M3: train err = 0%, test err = 30%
Background image of page 8
9 Effect of Multiple Comparison Procedure Consider the task of predicting whether stock market will rise/fall in the next 10 trading days Random guessing: P ( correct ) = 0.5 Make 10 random guesses in a row: Day 1 Up Day 2 Down Day 3 Down Day 4 Up Day 5 Down Day 6 Down Day 7 Up Day 8 Up Day 9 Up Day 10 Down
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 26

Overfitting-L5 - CSE572:DataMining Lecture 7: Model...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online