hw12 - CS 170 Algorithms Fall 2014 David Wagner HW12 Due Dec 5 6:00pm Instructions This homework is due Friday December 5 at 6:00pm electronically via

hw12 - CS 170 Algorithms Fall 2014 David Wagner HW12 Due...

This preview shows page 1 - 3 out of 5 pages.

CS 170 Algorithms Fall 2014 David Wagner HW12 Due Dec. 5, 6:00pm Instructions. This homework is due Friday, December 5, at 6:00pm electronically via glookup. This homework assignment is a programming assignment that is based on a machine learning application. You may work individually or in groups of two for this assignment. You may not work with more than one other person. If you work in a group of two, both of you must turn in a solution, and you must use pair programming (the two of you write all code together) or implement everything individually; you may not split up the problems and submit code your partner wrote on their own (e.g., “you implement Problem 1, I’ll code up Problem 2” is not allowed). 1. (50 pts.) K-Nearest Neighbors Digit classification is a classical problem that has been studied in depth by many researchers and computer scientists over the past few decades. Digit classification has many applications: for instance, postal services like the US Postal Service, UPS, and FedEx use pre-trained classifiers in order to speed up and accurately recognize handwritten addresses. Today, over 95% of all handwritten addresses are correctly classified through a computer rather than a human manually reading the address. The problem statement is as follows: given an image of a single handwritten digit, build a classifier that correctly predicts what the actual digit value of the image is. Thus, your classifier receives as input an image of a digit, and must output a class in the set { 0 , 1 , 2 ,..., 9 } . For this homework, you will attack this problem by using a k -nearest neighbors algorithm. We will give you a data set (a reduced version of the MNIST handwritten digit data set). Each image of a digit is a 28 × 28 pixel image. We have already extracted features, using a very simple scheme: each pixel is its own feature, so we have 28 2 = 784 features. The value of a feature is the intensity of that corresponding pixel, normalized to be in the range 0..1. We have preprocessed and vectorized these images into feature vectors for you. We have split the data set into training, validation, and test sets, and we’ve provided the class of each image in the training and validation sets. Your job is to infer the class of each image in the test set. Here are five examples of images that might appear in this data set, to help you visualize the data: We want you to do the following steps: (i) Implement the k -nearest neighbors algorithm. You can implement it in any way you like, in any programming language of your choice. For k > 1, decide on a rule for resolving ties (if there is a tie CS 170, Fall 2014, HW12 1
Image of page 1
for the majority vote among the k nearest neighbors when trying to classify a new observation, which one do you choose?). (ii) Using the training set as your training data, compute the class of each digit in the validation set, and compute the error rate on the validation set, for each of the following candidate values of k : k = 1 , 2 , 5 , 10 , 25.
Image of page 2
Image of page 3

You've reached the end of your free preview.

Want to read all 5 pages?

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture