This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Overfitting Data Mining Prof. Dawn Woodard School of ORIE Cornell University 1 Outline 1 Announcements 2 Overfitting 2 Announcements Questions? 4 Consider a classification problem with 2 continuous predictors. We can create a scatterplot of the predictor values ( X 1 , X 2 ) of the training data, showing points with Y = 1 as green and Y = 0 as red If we can find a good separating boundary for green vs. red, this may provide a good classifier The following example and plots are from Chap. 2 of the text (Hastie et al.) 6 Plot the training data and draw a good linear boundary: 7 This boundary misclassifies quite a few points To get a better boundary, we will use the simple k-Nearest Neighbors method: To predict for a new (test) observation based on the training data: Take the k observations in the training data that are closest to the new observation Predict Y for the new observation to be the “majority vote” of these points This classification rule implies a classification boundary on...
View Full Document
This note was uploaded on 12/23/2009 for the course ORIE 4740 at Cornell University (Engineering School).