simpleClassifier_4perPage - Overfitting Data Mining Prof....

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Overfitting Data Mining Prof. Dawn Woodard School of ORIE Cornell University 1 Outline 1 Announcements 2 Overfitting 2 Announcements Questions? 4 Consider a classification problem with 2 continuous predictors. We can create a scatterplot of the predictor values ( X 1 , X 2 ) of the training data, showing points with Y = 1 as green and Y = 0 as red If we can find a good separating boundary for green vs. red, this may provide a good classifier The following example and plots are from Chap. 2 of the text (Hastie et al.) 6 Plot the training data and draw a good linear boundary: 7 This boundary misclassifies quite a few points To get a better boundary, we will use the simple k-Nearest Neighbors method: To predict for a new (test) observation based on the training data: Take the k observations in the training data that are closest to the new observation Predict Y for the new observation to be the “majority vote” of these points This classification rule implies a classification boundary on...
View Full Document

This note was uploaded on 12/23/2009 for the course ORIE 4740 at Cornell University (Engineering School).

Page1 / 5

simpleClassifier_4perPage - Overfitting Data Mining Prof....

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online