plots/analysis electronically.1. For both the original data and the noisy, compute both theN-fold error on the training set and the testerror for K-NN withK= 1, forN={2,4,8,16}. What trend do you observe?Note that any tworandom partitions of the data will yield somewhat different curves. Therefore, you mustrepeat all of the above steps 100 times, using different random partitions into trainingand testing.2. For both the original data and the noisy, compute both the 10-fold error on the training set and thetest error for K-NN withK∈ {1,3,5, . . . ,15}andσ∈ {1,2,3, . . . ,8}. Generatefour plots: each plotwill showK(for K-NN) orσ(for kernel regression) on the X axis, and error on the Y axis. The plotwill have two lines with error bars: one for 10-fold error, and the other for test set error; you will haveone plot for each method/dataset combination (e.g., K-NN on standard, K-NN on noisy, etc.). Basedon these charts, can you pick the bestσandKto minimize test set error using cross validation (onaverage)? Which are the best values?3Decision Trees[50 points]Description.In this part of the assignment, you will generalize the ID3 algorithm discussed in class towork withmulti-class labels, e.g.Y∈ {1, . . . , K}. You will also explore the issue ofoverfittingwith decisiontrees as the depth of the tree increases.You will be using the MNIST digit dataset, which consists of roughly 54,000 training examples and 9,000test examples of handwritten digits. Each example is a 28×28 grayscale image, which leads to 784 pixelsto use as features.
2

Note that we have provided you with a working ID3 decision tree implementation forbinaryoutputs:Y∈ {0,1}. Unlike the code from the lecture page, this code is mostly optimized and scales to larger datasets.Portions of the code have been significantly vectorized. Thus, your challenge is to understand and modifythe existing decision tree implementation.Your task.You will need to provide the following: (see each file for exact specifications):•multientropy.m- This function computes the empirical entropy of samples ofY, whereY∈ {1, . . . , K}.Note that you may usebinaryentropy.mas a starting point.•dtchoosefeaturemulti.m- This function takes in a multi-class dataset and returns the best split onthe features of the dataset to maximize information gain. Note that you may use