Unformatted text preview: h large datasets. 19 20 5 10/21/13 Nearest neighbor classification in high
dimensions The curse of dimensionality Distance functions lose their usefulness in high dimensions. An umbrella term for the issues that can arise in high
dimensional data. Consider the Euclidean distance for example: v
ud
uX
Dis 2 (x, y) = t (xi yi ) 2 i=1 We expect that if d is large, many of the features won’t be
relevant, and so the signal contained in the informative
dimensions can easily be corrupted by the noise.
This can lead to low accuracy of a nearest neighbor classifier.
Solution: feature selection, dimensionality reduction (chapter
10) 21 The curse of dimensionality kNN Some of our intuition from low dimensional spaces breaks in high
dimensions. Use the closest k neighbors to make a decision instead of a
single nearest neighbor Example: In high dimensions, most of the volume of the unit
sphere is very close to its surface. Choose the label that occurs among the majority of the k
nearest neighbors Let’s compute the fraction of the vol...
View
Full Document
 Fall '08
 Anderson,C
 Machine Learning, Distance, Nearest neighbor search, University of Bristol, nearest neighbor, Peter Flach

Click to edit the document details