For this kind of situation a smaller k does better on this dataset This issue

# For this kind of situation a smaller k does better on

• 147
• 100% (10) 10 out of 10 people found this document helpful

This preview shows page 100 - 102 out of 147 pages.

get mis-classified as a 1. For this kind of situation, a smaller k does better on this dataset. This issue can be solved with a bigger dataset and better features. This is also the reason k -nearest neighbors algorithm works better as the dataset grows larger. If your error rate did not go up for larger k values, you should not worry as this is most likely a result of small implementation details such as tie breaking. (b) Our tie-breaking rule was to pick randomly between the classes that had the most number of votes. (c) Even when the error rate decreases as the value of k increases, the k -nearest neighbors algorithm does not perform significantly better for k = 2 than for k = 1. This issue arises because of tie-breaking issues. There are two cases: • Suppose for a testing data point, the two closest neighbors are of the same class. In this case, the k -nearest neighbors algorithm for k = 1 and k = 2 will return the same class, so there is no difference in this case. CS 170, Fall 2014, Soln 12 2
• Alternatively, suppose that the two closest neighbors are of two different classes. In this case, it is more likely that the class associated with the closer neighbor out of the two will represent the correct class. Any tie-breaking procedure for k = 2 that picks the second-closest neighbor a non-zero percentage of the time will thus do worse than the k -nearest algorithm with k = 1. (d) The following are some examples of misclassified digits: The first digit was misclassified as a 7, the second digit was misclassified as a 6, and the last digit was misclassified as a 0. The algorithm misclassified a lot of different points because we are simply computing the Euclidean distances of the pixel vectors. This does not account for any linear transfor- mations to the digits such as rotation or translation. For example, the digit 1 that is centered and a digit 1 that is offset just by a few pixels to the right will have very high Euclidean distance even though semantically the two are essentially exactly the same. There are many possible ways that might help address this problem. One possibility is to compute features that take number of on pixel values for all possible 4 by 4 or 5 by 5 grid of pixels. A classifier using these features will be more robust to translations or rotations since most of these computed features of a certain digit remain the same even after the translation or rotation of that digit. Orientation histogram features are also good features to add for this problem because they also do provide linear transformation invariance (see, e.g., oriented_gradients ). Yet another possibility would be to select some additional features that capture information about the shape of the digit. For instance, for each row that is not entirely blank, we could extract the column of the leftmost white pixel, and compute the standard deviation of these values, then do the same for the rightmost white pixel. Thus a vertical stroke like the digit 1 will have low standard deviation on both sides while a digit like 2 or 5 will have a high standard deviation on both sides. The digit 5 might have