CS 6375 Machine Learning
Homework 5
Due: 04/11/2008, 11:59pm
1.
Bias and Variance. (15 pts)
Alpaydin book, Chapter 4, problem 9.
Let us say, given the samples
}
,
{
t
i
t
i
i
r
x
X
=
, where t is the index for the instances in the training
set, we define
1
)
(
i
i
r
x
g
=
, namely, our estimate for any
x
is the
r
value of the first instance in the
(unordered) data set
X
i
. What can you say about its bias and variance, as compared with g
i
(x)=2
and
N
r
x
g
t
t
i
i
/
)
(
∑
=
? What if the sample is ordered, so that
t
i
t
i
r
x
g
min
)
(
=
?
Note:
for this problem, you don’t need to prove it mathematically, a conceptual explanation is
fine.
Also the index
i
used in this problem is not the index for a training instance, rather it’s
regarding the entire training set since you can have different data sets.
Sol:
Taking any instance has less bias than taking a constant but has higher variance. It has higher
variance than the average and it may have higher bias. If the sample is ordered so that the
instance we pick is the minimum, variance decreases (minimums tend to get more similar
to each other) and bias may also increase.
2.
Boosting. (30 pts)
Three learning algorithms in a binary classification problem are applied independently to a set of
1000 training examples to train three classifiers.
•
Algorithm A
produces
Classifier A
that correctly classifies 800 examples and
incorrectly classifies 200 examples.
•
Algorithm B
produces
Classifier B
that correctly classifies 800 examples and incorrectly
classifies 200 examples. All the mistakes of Classifier B are on examples that were
correctly classified by Classifier A.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 yangliu
 Machine Learning, classifier, Classification result

Click to edit the document details