This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Nearest Neighbor Classification Charles Elkan [email protected] January 11, 2011 What is called supervised learning is the most fundamental task in machine learning. In supervised learning, we have training examples and test examples. A training example is an ordered pair h x, y i where x is an instance and y is a label. A test example is an instance x with unknown label. The goal is to predict labels for test examples. The name “supervised” comes from the fact that some supervisor or teacher has provided explicit labels for training examples. Assume that instances x are members of the set X , while labels y are members of the set Y . Then a classifier is any function f : X → Y . A supervised learning algorithm is not a classifier. Instead, it is an algorithm whose output is a classifier. Mathematically, a supervised learning algorithm is a higher-order function: it is a function of type ( X × Y ) n → ( X → Y ) where n is the cardinality of the training set. The nearest-neighbor method is perhaps the simplest of all algorithms for pre- dicting the class of a test example. The training phase is trivial: simply store every training example, with its label. To make a prediction for a test example, first compute its distance to every training example. Then, keep the k closest training examples, where k ≥ 1 is a fixed integer. Look for the label that is most common among these examples. This label is the prediction for this test example. Using the same set notation as above, the nearest-neighbor method is a function of type ( X × Y ) n × X → Y . A distance function has type X × X → R . This basic method is called the k NN algorithm. There are two major design choices to make: the value of k , and the distance function to use. When there are two alternative classes, in order to avoid ties the most common choice for k is a small odd integer, for example k = 3 . If there are more than two classes, then ties are possible even when k is odd. Ties can also arise when two distance values are 1 the same. An implementation of k NN needs a sensible algorithm to break ties; there is no consensus on the best way to do this. When each example is a fixed-length vector of real numbers, the most common distance function is Euclidean distance: d ( x, y ) = || x- y || = p ( x- y ) · ( x- y ) = ( m X i =1 ( x i- y i ) 2 ) 1 / 2 where x and y are points in X = R m . An obvious disadvantage of the k NN method is the time complexity of making predictions. Suppose that there are n training examples in R m . Then applying the method to one test example requires O ( nm ) time, compared to just O ( m ) time to apply a linear classifier such as a perceptron. If the training data are stored in a sophisticated data structure, for example a kd-tree, then finding nearest neighbors can be done much faster if the dimensionality m is small. However, for dimensionality m ≥ 20 about, no data structure is known that is useful in practice [Kibriya and Frank, 2007]....
View Full Document
- Spring '10