Homework 5 Solutions
1) Read Chapter 5 (Sections 5.2, 5.3, 5.5 and 5.6).
2)
a)
library(class)
train<read.csv("sonar_train.csv",header=FALSE)
y<as.factor(train[,61])
x<train[,1:60]
test<read.csv("sonar_test.csv",header=FALSE)
y_test<as.factor(test[,61])
x_test<test[,1:60]
train_error<rep(0,10)
test_error<rep(0,10)
for (my_k in 1:10){
fit<knn(x,x,y,k=my_k)
train_error[my_k]<1sum(y==fit)/length(y)
fit_test<knn(x,x_test,y,k=my_k)
test_error[my_k]<1sum(y_test==fit_test)/length(y_test)
}
plot(seq(1,10),test_error,type="o",pch=19,ylim=c(0,.5),
ylab="Error Rate",xlab="k",
main="Rajan Patel's Nearest Neighbor Error Plot")
points(train_error,type="o",pch=19,lwd=4,col="green")
legend(4,.5,c("Training Error","Test Error"),
col=c("green","black"),pch=19,lwd=c(4,1))
View Full Document This plot indicates k=3 gives the lowest test error, but your answer might differ as
a result of the randomization discussed in part b.
b) In the help for the knn function it states “ties broken at random”.
For odd
k, there will never be ties, while for even k, there are frequently ties.
2
4
6
8
10
0.0
0.1
0.2
0.3
0.4
0.5
Rajan Patel's Nearest Neighbor Error Plot
k
Error Rate
Training Error
Test Error
3)
a) It looks like abline(c(.05,1)) matches which means the slope is 1 and the
