You can now build andtraina kNN classifier onXtr,Ytrand make predictions on some dataXvawith it:1knn=ml.knn.knnClassify()# create the object and train it2knn.train(Xtr,Ytr,K)# where K is an integer, e.g. 1 for nearest neighbor prediction3YvaHat=knn.predict(Xva)# get estimates of y for each data point in Xva45# Alternatively, the constructor provides a shortcut to "train":6knn=ml.knn.knnClassify(Xtr,Ytr,K);7YvaHat=predict(knn,Xva);If your data are 2D, you can visualize the data set and a classifier’s decision regions using the function1ml.plotClassify2D(knn,Xtr,Ytr);# make 2D classification plot with data (Xtr,Ytr)This function plots the training data and colored points as per their labels, then callsknn’spredictfunction on adensely spaced grid of points in the 2D space, and uses this to produce the background color. Calling the functionwithknn=Nonewill plot only the data.1.Modify the code listed above to use only the first two features ofX(e.g., letXbe only the first two columnsofiris, instead of the first four), and visualize (plot) the classification boundary for varying values ofK= [1,5,10,50]usingplotClassify2D.(10 points)2.Again using only the first two features, compute the error rate (number of misclassifications) on both thetraining and validation data as a function ofK= [1,2,5,10,50,100,200]. You can do this most easily witha for-loop:Homework 1UC Irvine2/4
Subscribe to view the full document.
CS 178: Machine Learning & Data MiningFall 20191K=[1,2,5,10,50,100,200];2errTrain[i] = [None]*len(K)# (preallocate storage for training error)3fori,kin enumerate(K):4learner=ml.knn.knnClassify(...# TODO: complete code to train model5Yhat=learner.predict(...# TODO: predict results on training data6errTrain[i] = ...# TODO: count what fraction of predictions are wrong7#TODO: repeat prediction / error evaluation for validation data89plt.semilogx(...#TODO: average and plot results on semi-log scalePlot the resulting error rate functions using a semi-log plot (semilogx), with training error in red andvalidation error in green. Based on these plots, what value of K would you recommend?(10 points)3.Create the same error rate plots as the previous part, but with all the features in the dataset. Are the plotsvery different? Is your recommendation for the best K different?(5 points)Problem 3: Naïve Bayes Classifiers (50 points)In order to reduce my email load, I decide to implement a machine learning algorithm to decide whether ornot I should read an email, or simply file it away instead. To train my model, I obtain the following data set ofbinary-valued features about each email, including whether I know the author or not, whether the email is long orshort, and whether it has any of several key words, along with my final decision about whether to read it (y= +1for “read”,y=-1 for “discard”).
As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.
Temple University Fox School of Business ‘17, Course Hero Intern
I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.
University of Pennsylvania ‘17, Course Hero Intern
The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.
Tulane University ‘16, Course Hero Intern
Ask Expert Tutors
You can ask 0 bonus questions
You can ask 0 questions (0 expire soon)
You can ask 0 questions
(will expire )