11_distances

Running time for testing an example when dataset has

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: h large datasets. 19 20 5 10/21/13 Nearest neighbor classification in high dimensions The curse of dimensionality Distance functions lose their usefulness in high dimensions. An umbrella term for the issues that can arise in high dimensional data. Consider the Euclidean distance for example: v ud uX Dis 2 (x, y) = t (xi yi ) 2 i=1 We expect that if d is large, many of the features won’t be relevant, and so the signal contained in the informative dimensions can easily be corrupted by the noise. This can lead to low accuracy of a nearest neighbor classifier. Solution: feature selection, dimensionality reduction (chapter 10) 21 The curse of dimensionality k-NN Some of our intuition from low dimensional spaces breaks in high dimensions. Use the closest k neighbors to make a decision instead of a single nearest neighbor Example: In high dimensions, most of the volume of the unit sphere is very close to its surface. Choose the label that occurs among the majority of the k nearest neighbors Let’s compute the fraction of the vol...
View Full Document

This note was uploaded on 02/10/2014 for the course CS 545 taught by Professor Anderson,c during the Fall '08 term at Colorado State.

Ask a homework question - tutors are online