*This preview shows
pages
1–2. Sign up
to
view the full content.*

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **Problem Set 3 December 3, 2009 Due date: Wed, Dec 9 2009 at 4pm; submit by email. Exercise 1: (30 points) You are asked to evaluate the performance of two classification models, M 1 and M 2 . The test set you have chosen contains 26 binary attributes, labeled as A through Z . Table 1 shows the posterior probabilities obtained by applying the two models on the test set. (Only the posterior probabilities for the positive class are shown.) As this is a two-class problem, P (- ) = 1- P (+) and P (- | A,...,Z ) = 1- P (+ | A,...,Z ). Assume that we are mostly interested in detecting instances from the positive class. 1. Plot the ROC curve for both M 1 and M 2 . You should plot both curves on the same graph. Explain which model you think its better and why. (5 points) 2. For model M 1 , suppose you choose the cutoff threshold to be t = 0 . 5. In other words, any test instances with posterior probability greater than t will be classified as a positive example. Compute the precision, recall and F-measure for such model at this threshold value. (5 points) 3. Repeat question 2 above for model M 2 using the same cutoff threshold. (5 points) 4. Compare the F-measure results for both models. Which model is better? Are the results consistent with what you expect from the ROC curve? (5 points) 5. Repeat question 2 above for model M 1 using threshold t = 0 . 1. Which threshold do you prefer, t = 0 . 5 or t = 0 . 1 and why? Are the results consistent with what you expect from the ROC curve? (5 points) Instance True Class P (+ | A,...,Z,M 1 ) P (+ | A,...,Z,M 2 ) 1 + 0.73 0.61 2 + 0.69 0.03 3- 0.44 0.68 4- 0.55 0.31 5 + 0.67 0.45 6 + 0.47 0.09 7- 0.08 0.38 8- 0.15 0.05 9 + 0.45 0.01 10- 0.350....

View
Full
Document