{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}


simpleClassifier_4perPage - Outline Overtting 1...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Overfitting Data Mining Prof. Dawn Woodard School of ORIE Cornell University 1 Outline 1 Announcements 2 Overfitting 2 Announcements Questions? 4 Consider a classification problem with 2 continuous predictors. We can create a scatterplot of the predictor values ( X 1 , X 2 ) of the training data, showing points with Y = 1 as green and Y = 0 as red If we can find a good separating boundary for green vs. red, this may provide a good classifier The following example and plots are from Chap. 2 of the text (Hastie et al.) 6
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Plot the training data and draw a good linear boundary: 7 This boundary misclassifies quite a few points To get a better boundary, we will use the simple k -Nearest Neighbors method: To predict for a new (test) observation based on the training data: Take the k observations in the training data that are closest to the new observation Predict Y for the new observation to be the “majority vote” of these points This classification rule implies a classification boundary on our plot 8 Here’s the 15-Nearest Neighbor classification boundary: 9 Does this boundary appear to be better than the linear boundary?
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}