CS 6375 Homework 3
Chenxi Zeng, UTD ID: 11124236
1.
K=1:
K=3:
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
We can see from the figures above: Large k (k=3) is less sensitive to noise
and can better estimate for discrete classes; Small k can capture the better
structure of the space, but sometimes too exactly, and not sensitive to noise
(overfitting).
2.
a)
When k is odd:
The error happens when a data point P is in class C1, but the number of
points in C1 is less than
(
1) 2
k

, such as 0,1,2,…,
(
1) 2
k

point.
Since there are n points in the space, and they share equal prior probabilities,
then
1 2
of them is of class C1,
1 2
is class C2. We have
2
n
combination of
two different points.
Therefore the average probability of error is
( )
n
p
e
=
(
1) 2
0
1
2
k
n
j
n
j

=
÷
∑
.
b)
From the conclusion from a), we know that
( )
n
p
e
is
1
2
n
when k=1.
When k>1, following by the same way as in a), we know that if k is even,
( )
n
p
e
is at least
(
2) 2
0
1
2
k
n
j
n
j

=
÷
∑
. Both the odd and even results are greater or
equal than
1
2
n
. So 1nearest neighbor rule has a lower error probability than
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 yangliu
 Machine Learning, Word sense disambiguation, Hidden Markov model, partofspeech tagger, Chenxi Zeng, equal prior probabilities, Practical PartofSpeech Tagger

Click to edit the document details