Problem Set 3
December 3, 2009
Due date:
Wed, Dec 9 2009 at 4pm; submit by email.
Exercise 1:
(30 points) You are asked to evaluate the performance of two classification models,
M
1
and
M
2
. The test set you have chosen contains 26 binary attributes, labeled as
A
through
Z
. Table 1 shows the
posterior probabilities obtained by applying the two models on the test set. (Only the posterior probabilities
for the positive class are shown.) As this is a twoclass problem,
P
(

) = 1

P
(+) and
P
(
 
A, . . . , Z
) =
1

P
(+

A, . . . , Z
). Assume that we are mostly interested in detecting instances from the positive class.
1. Plot the ROC curve for both
M
1
and
M
2
. You should plot both curves on the same graph. Explain
which model you think it’s better and why. (5 points)
2. For model
M
1
, suppose you choose the cutoff threshold to be
t
= 0
.
5. In other words, any test instances
with posterior probability greater than
t
will be classified as a positive example. Compute the precision,
recall and Fmeasure for such model at this threshold value. (5 points)
3. Repeat question 2 above for model
M
2
using the same cutoff threshold. (5 points)
4. Compare the Fmeasure results for both models. Which model is better? Are the results consistent
with what you expect from the ROC curve? (5 points)
5. Repeat question 2 above for model
M
1
using threshold
t
= 0
.
1. Which threshold do you prefer,
t
= 0
.
5
or
t
= 0
.
1 and why? Are the results consistent with what you expect from the ROC curve? (5 points)
Instance
True Class
P
(+

A, . . . , Z, M
1
)
P
(+

A, . . . , Z, M
2
)
1
+
0.73
0.61
2
+
0.69
0.03
3

0.44
0.68
4

0.55
0.31
5
+
0.67
0.45
6
+
0.47
0.09
7

0.08
0.38
8

0.15
0.05
9
+
0.45
0.01
10

0.35
0.04
Table 1: Table of posterior probabilities obtained by applying the two classification models on the test set.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '09
 Data Mining, Markov chain, Doubly stochastic matrix, Stochastic matrix, topk list, pKendall tau distance

Click to edit the document details