100%(10)10 out of 10 people found this document helpful
This preview shows page 100 - 102 out of 147 pages.
get mis-classified as a 1. For this kind of situation, a smallerkdoes better on this dataset.This issue can be solved with a bigger dataset and better features. This is also the reasonk-nearestneighbors algorithm works better as the dataset grows larger. If your error rate did not go up for largerkvalues, you should not worry as this is most likely a result of small implementation details such astie breaking.(b) Our tie-breaking rule was to pick randomly between the classes that had the most number of votes.(c) Even when the error rate decreases as the value ofkincreases, thek-nearest neighbors algorithm doesnot perform significantly better fork=2 than fork=1. This issue arises because of tie-breakingissues. There are two cases:• Suppose for a testing data point, the two closest neighbors are of the same class. In this case,thek-nearest neighbors algorithm fork=1 andk=2 will return the same class, so there is nodifference in this case.CS 170, Fall 2014, Soln 122
• Alternatively, suppose that the two closest neighbors are of two different classes. In this case,it is more likely that the class associated with the closer neighbor out of the two will representthe correct class. Any tie-breaking procedure fork=2 that picks the second-closest neighbor anon-zero percentage of the time will thus do worse than thek-nearest algorithm withk=1.(d) The following are some examples of misclassified digits:The first digit was misclassified as a 7, the second digit was misclassified as a 6, and the last digitwas misclassified as a 0. The algorithm misclassified a lot of different points because we are simplycomputing the Euclidean distances of the pixel vectors. This does not account for any linear transfor-mations to the digits such as rotation or translation. For example, the digit 1 that is centered and a digit1 that is offset just by a few pixels to the right will have very high Euclidean distance even thoughsemantically the two are essentially exactly the same.There are many possible ways that might help address this problem. One possibility is to computefeatures that take number of on pixel values for all possible 4 by 4 or 5 by 5 grid of pixels. A classifierusing these features will be more robust to translations or rotations since most of these computedfeatures of a certain digit remain the same even after the translation or rotation of that digit. Orientationhistogram features are also good features to add for this problem because they also do provide lineartransformation invariance (see, e.g.,oriented_gradients).Yet another possibility would be to select some additional features that capture information about theshape of the digit. For instance, for each row that is not entirely blank, we could extract the column ofthe leftmost white pixel, and compute the standard deviation of these values, then do the same for therightmost white pixel. Thus a vertical stroke like the digit 1 will have low standard deviation on bothsides while a digit like 2 or 5 will have a high standard deviation on both sides. The digit 5 might have