get mis-classified as a 1. For this kind of situation, a smaller
k
does better on this dataset.
This issue can be solved with a bigger dataset and better features. This is also the reason
k
-nearest
neighbors algorithm works better as the dataset grows larger. If your error rate did not go up for larger
k
values, you should not worry as this is most likely a result of small implementation details such as
tie breaking.
(b) Our tie-breaking rule was to pick randomly between the classes that had the most number of votes.
(c) Even when the error rate decreases as the value of
k
increases, the
k
-nearest neighbors algorithm does
not perform significantly better for
k
=
2 than for
k
=
1. This issue arises because of tie-breaking
issues. There are two cases:
• Suppose for a testing data point, the two closest neighbors are of the same class. In this case,
the
k
-nearest neighbors algorithm for
k
=
1 and
k
=
2 will return the same class, so there is no
difference in this case.
CS 170, Fall 2014, Soln 12
2

• Alternatively, suppose that the two closest neighbors are of two different classes. In this case,
it is more likely that the class associated with the closer neighbor out of the two will represent
the correct class. Any tie-breaking procedure for
k
=
2 that picks the second-closest neighbor a
non-zero percentage of the time will thus do worse than the
k
-nearest algorithm with
k
=
1.
(d) The following are some examples of misclassified digits:
The first digit was misclassified as a 7, the second digit was misclassified as a 6, and the last digit
was misclassified as a 0. The algorithm misclassified a lot of different points because we are simply
computing the Euclidean distances of the pixel vectors. This does not account for any linear transfor-
mations to the digits such as rotation or translation. For example, the digit 1 that is centered and a digit
1 that is offset just by a few pixels to the right will have very high Euclidean distance even though
semantically the two are essentially exactly the same.
There are many possible ways that might help address this problem. One possibility is to compute
features that take number of on pixel values for all possible 4 by 4 or 5 by 5 grid of pixels. A classifier
using these features will be more robust to translations or rotations since most of these computed
features of a certain digit remain the same even after the translation or rotation of that digit. Orientation
histogram features are also good features to add for this problem because they also do provide linear
transformation invariance (see, e.g.,
oriented_gradients
).
Yet another possibility would be to select some additional features that capture information about the
shape of the digit. For instance, for each row that is not entirely blank, we could extract the column of
the leftmost white pixel, and compute the standard deviation of these values, then do the same for the
rightmost white pixel. Thus a vertical stroke like the digit 1 will have low standard deviation on both
sides while a digit like 2 or 5 will have a high standard deviation on both sides. The digit 5 might have