This preview shows pages 1–4. Sign up to view the full content.
Homework 5 Solutions
1) Read Chapter 5 (Sections 5.2, 5.3, 5.5 and 5.6).
2)
a)
library(class)
train<read.csv("sonar_train.csv",header=FALSE)
y<as.factor(train[,61])
x<train[,1:60]
test<read.csv("sonar_test.csv",header=FALSE)
y_test<as.factor(test[,61])
x_test<test[,1:60]
train_error<rep(0,10)
test_error<rep(0,10)
for (my_k in 1:10){
fit<knn(x,x,y,k=my_k)
train_error[my_k]<1sum(y==fit)/length(y)
fit_test<knn(x,x_test,y,k=my_k)
test_error[my_k]<1sum(y_test==fit_test)/length(y_test)
}
plot(seq(1,10),test_error,type="o",pch=19,ylim=c(0,.5),
ylab="Error Rate",xlab="k",
main="Rajan Patel's Nearest Neighbor Error Plot")
points(train_error,type="o",pch=19,lwd=4,col="green")
legend(4,.5,c("Training Error","Test Error"),
col=c("green","black"),pch=19,lwd=c(4,1))
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document This plot indicates k=3 gives the lowest test error, but your answer might differ as
a result of the randomization discussed in part b.
b) In the help for the knn function it states “ties broken at random”.
For odd
k, there will never be ties, while for even k, there are frequently ties.
2
4
6
8
10
0.0
0.1
0.2
0.3
0.4
0.5
Rajan Patel's Nearest Neighbor Error Plot
k
Error Rate
Training Error
Test Error
3)
a) It looks like abline(c(.05,1)) matches which means the slope is 1 and the
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 08/20/2011 for the course STATS 202 taught by Professor Taylor during the Summer '09 term at Stanford.
 Summer '09
 TAYLOR

Click to edit the document details