CIS 4526
Download the Letter Recognition Data Set from the UCI Machine Learning Repository. This dataset contains 20,000 examples.
View the step-by-step solution to:

Question

# Download the Letter Recognition Data Set from the UCI Machine Learning Repository. This dataset contains 20,000

examples. Divide the set so that the first 15,000 examples are for training and the remaining 5,000 for testing.

You will implement 2 algorithms from class: (1) the k-NN algorithm and (2) the "pocket" algorithm.

Let:

• num_train = number of training examples
• num_test = number of testing examples
• num_dims = the dimensionality of the examples

You should implement the following functions. (Implementations that do not conform to these specifications will lose a significant amount of credit for this assignment.)

• pred_y = test_knn(train_x, train_y, test_x, num_nn)
• where train_x is a (num_train, num_dims) data matrix, test_x is a (num_test, num_dims) data matrix, train_y is a (num_train,) label vector, and pred_y is a (num_test,) label vector, and num_nn is the number of nearest neighbors for classification.
• w = train_pocket(train_x, train_y, num_iters)
• where train_x is a (num_train, num_dims) data matrix, train_y is a (num_train,) +1/-1 label vector, num_iters is the number of iterations for the algorithm, w is a vector of learned perceptron weights.
• pred_y = test_pocket(w, test_x)
• where w is a vector of learned perceptron weights, test_x is a (num_test, num_dims) data matrix, and pred_y is a (num_test,) +1/-1 label vector.
• acc = compute_accuracy(test_y, pred_y)
• where test_y is a (num_test,) label vector, and pred_y is a (num_test,) label vector, and acc is a float between 0.0 and 1.0, representing the classification accuracy.
• id = get_id()
• where id is a string representing your Temple Accessnet (e.g., "tua12345")

For the algorithms (k-NN, pocket), run the following experiments

• Randomly subsample the training data for num_train = {100, 1000, 2000, 5000, 10000, 15000}
• For k-NN, use the following values for k = {1,3,5,7,9} (5 versions of k-NN)
• Note: You should run at least 6 (algorithms) * 6 (values of num_train) = 36 total experiments
• These algorithms include 5 versions of k-NN and one-vs-all (OVA) classification with perceptrons.

Notes

• A code skeleton has been provided for you. Assume your code will be run as a module, so do not include any statements outside of functions.
• Any reference to a "matrix" or "array" or "vector" for input and output should be of the type numpy.ndarray. DO NOT use another type (e.g., lists, dictionary, numpy.mat).
• For numpy arrays, there is a difference between 1D arrays, where shape=(n,), and 2D arrays with a singleton dimension, where shape=(n,1). Be sure to use 1D arrays where appropriate.
• As described in class, the pocket algorithm isn't designed for the multi-class case. Consider one-vs-all (OVA) classification and write related code directly in the main function.
• Do not use (or even refer to) any implementations of k-nn (e.g., sklearn.neighbors) or PLA/pocket.

By the due date, turn in a ZIP file (pa1.zip) which contains:

• Your single Python source file (written in Python 3) named pa1.py which contains the specified functions (plus any helper code you need).
• A project write-up (pa1.pdf) that contains:
• An English description of your algorithms, including any assumptions or design decisions you made. This discussion should include (but not be limited to) any choices you made that were not explicitly described here and how num_iters was selected for the pocket algorithm.
• For each experiment, report the classification accuracy. Additionally, for one experiment, include a confusion matrix of the results. You will be graded on how well you present these results.
• Discussion of the various experiments and what contribution the changes had on the accuracy and running time.
• If there were any problems with your implementation (e.g. clearly wrong output) then make sure to indicate that in your write-up and give as much information as you can as to what you think is causing the problem.

Your submission should be a single ZIP file, which includes only the files specified above. Do not include any other files or internal folders in your submission. Part of your score for this assignment will be for following directions.

Your code will be checked for similarity to other sources. Modern cheat detectors are quite hard to fool, so please don't try. You are far better off submitting your own incomplete or non-functional code than taking a chance copying (or even looking at) code from a classmate or the Web. As stated in the course syllabus, violations of academic integrity are dealt with harshly.

### Why Join Course Hero?

Course Hero has all the homework and study help you need to succeed! We’ve got course-specific notes, study guides, and practice tests along with expert tutors.

### -

Educational Resources
• ### -

Study Documents

Find the best study resources around, tagged to your specific courses. Share your own to gain free Course Hero access.

Browse Documents