Unformatted text preview: h large datasets. 19 20 5 10/21/13 Nearest neighbor classification in high
dimensions The curse of dimensionality Distance functions lose their usefulness in high dimensions. An umbrella term for the issues that can arise in high
dimensional data. Consider the Euclidean distance for example: v
Dis 2 (x, y) = t (xi yi ) 2 i=1 We expect that if d is large, many of the features won’t be
relevant, and so the signal contained in the informative
dimensions can easily be corrupted by the noise.
This can lead to low accuracy of a nearest neighbor classifier.
Solution: feature selection, dimensionality reduction (chapter
10) 21 The curse of dimensionality k-NN Some of our intuition from low dimensional spaces breaks in high
dimensions. Use the closest k neighbors to make a decision instead of a
single nearest neighbor Example: In high dimensions, most of the volume of the unit
sphere is very close to its surface. Choose the label that occurs among the majority of the k
nearest neighbors Let’s compute the fraction of the vol...
View Full Document
- Fall '08
- Machine Learning, Distance, Nearest neighbor search, University of Bristol, nearest neighbor, Peter Flach