Question

Download the Letter Recognition Data Set from the UCI Machine Learning

Repository. This dataset contains 20,000 examples. Divide the set so that the first 15,000 examples are for training and the remaining 5,000 for testing.
You will implement 2 algorithms from class: (1) the k-NN algorithm and (2) the "pocket" algorithm.
Let:

-num_train = number of training examples
-num_test = number of testing examples
-num_dims = the dimensionality of the examplesYou should implement the following functions. (Implementations that do not conform to these specifications will lose a significant amount of credit for this assignment.)

-pred_y = test_knn(train_x, train_y, test_x, num_nn)
-where train_x is a (num_train, num_dims) data matrix, test_x is a (num_test, num_dims) data matrix, train_y is a (num_train,) label vector, and pred_y is a (num_test,) label vector, and num_nn is the number of nearest neighbors for classification.
-w = train_pocket(train_x, train_y, num_iters)
-where train_x is a (num_train, num_dims) data matrix, train_y is a (num_train,) +1/-1 label vector, num_iters is the number of iterations for the algorithm, w is a vector of learned perceptron weights.
-pred_y = test_pocket(w, test_x)
-where w is a vector of learned perceptron weights, test_x is a (num_test, num_dims) data matrix, and pred_y is a (num_test,) +1/-1 label vector.
-acc = compute_accuracy(test_y, pred_y)
-where test_y is a (num_test,) label vector, and pred_y is a (num_test,) label vector, and acc is a float between 0.0 and 1.0, representing the classification accuracy.
-id = get_id()
-where id is a string representing your Temple Accessnet (e.g., "tua12345")For the algorithms (k-NN, pocket), run the following experiments

-Randomly subsample the training data for num_train = {100, 1000, 2000, 5000, 10000, 15000}
-For k-NN, use the following values for k = {1,3,5,7,9} (5 versions of k-NN)
-Note: You should run at least 6 (algorithms) * 6 (values of num_train) = 36 total experiments
-These algorithms include 5 versions of k-NN and one-vs-all (OVA) classification with perceptrons.Notes

-A code skeleton has been provided for you. Assume your code will be run as a module, so do not include any statements outside of functions.
-Any reference to a "matrix" or "array" or "vector" for input and output should be of the type numpy.ndarray. DO NOT use another type (e.g., lists, dictionary, numpy.mat).
-For numpy arrays, there is a difference between 1D arrays, where shape=(n,), and 2D arrays with a singleton dimension, where shape=(n,1). Be sure to use 1D arrays where appropriate.
-As described in class, the pocket algorithm isn't designed for the multi-class case. Consider one-vs-all (OVA) classification and write related code directly in the main function.
-Do not use (or even refer to) any implementations of k-nn (e.g., sklearn.neighbors) or PLA/pocket.By the due date, turn in a ZIP file (pa1.zip) which contains:

-Your single Python source file (written in Python 3) named pa1.py which contains the specified functions (plus any helper code you need).
-A project write-up (pa1.pdf) that contains:
-An English description of your algorithms, including any assumptions or design decisions you made. This discussion should include (but not be limited to) any choices you made that were not explicitly described here and how num_iters was selected for the pocket algorithm.
-For each experiment, report the classification accuracy. Additionally, for one experiment, include a confusion matrix of the results. You will be graded on how well you present these results.
-Discussion of the various experiments and what contribution the changes had on the accuracy and running time.
-If there were any problems with your implementation (e.g. clearly wrong output) then make sure to indicate that in your write-up and give as much information as you can as to what you think is causing the problem.Your submission should be a single ZIP file, which includes only the files specified above. Do not include any other files or internal folders in your submission. Part of your score for this assignment will be for following directions.


# Note: this is just a template for PA 1 and the code is for references only.
# Feel free to design the pipeline of the *main* function. However, one should keep
# the interfaces for the other functions unchanged. Change the returned values of
# these functions so that they are consistent with the assignment instructions.
# In general, one will only need to add the code below the TO-DO statements to
# finish the assignment. Additional import statements can be included when needed.
#
# For the kNN classifier, one could use existing libraries to compute the pairwise
# Euclidean distances between the test and training data, as for-loops in Python
# are pretty slow. Other than that, the designs of all functions should be your
# original work.


import csv
import numpy as np


def compute_accuracy(test_y, pred_y):


    # TO-DO: add your code here


    return None


def test_knn(train_x, train_y, test_x, num_nn):


    # TO-DO: add your code here


    return None


def test_pocket(w, test_x):


    # TO-DO: add your code here


    return None


def train_pocket(train_x, train_y, num_iters):


    # TO-DO: add your code here


    return None


def get_id():


    # TO-DO: add your code here


    return 'tuxddddd'


def main():


    # Read the data file
    szDatasetPath = './letter-recognition.data' # Put this file in the same place as this script
    listClasses = []
    listAttrs = []
    with open(szDatasetPath) as csvFile:
        csvReader = csv.reader(csvFile, delimiter=',')
        for row in csvReader:
            listClasses.append(row[0])
            listAttrs.append(list(map(float, row[1:])))


    # Generate the mapping from class name to integer IDs
    mapCls2Int = dict([(y, x) for x, y in enumerate(sorted(set(listClasses)))])


    # Store the dataset with numpy array
    dataX = np.array(listAttrs)
    dataY = np.array([mapCls2Int[cls] for cls in listClasses])


    # Split the dataset as the training set and test set
    nNumTrainingExamples = 15000
    trainX = dataX[:nNumTrainingExamples, :]
    trainY = dataY[:nNumTrainingExamples]
    testX = dataX[nNumTrainingExamples:, :]
    testY = dataY[nNumTrainingExamples:]


    # TO-DO: add your code here


    return None


if __name__ == "__main__":
    main()

CIS 4526
Downloadthe Letter Recognition Data Setfrom the UCI Machine Learning Repository. This dataset contains 20,000 examples. Divide the set so that the...
Get unstuck

321,403 students got unstuck by Course
Hero in the last week

step by step solutions

Our Expert Tutors provide step by step solutions to help you excel in your courses