Data Mining CS57300 Purdue University October 7, 2010

Comparing algorithms
Score functions • Zero-one loss • Accuracy • Sensitivity/specifcity • Precision/Recall/F1 • Absolute loss • Squared loss • Root mean-squared error • Likelihood/conditional likelihood • Area under the ROC curve ) Predicted Actual TN FN FP TP + +

ROC curves • Receiver Operating Characteristic curve • Plots the true positive rate against the false positive rate for different classiFcation thresholds 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FP Rate TP Rate Base RPT
How to compute ROC curve P(Y) True class 0.51 + 0.07 - 0.84 + 0.94 + 0.67 + 0.58 - 0.10 - 0.42 + 0.16 - 0.94 0.84 + 0.67 0.58 - 0.51 0.42 + 0.16 0.10 - 0.07 P(Y) True class Predict class 0.94 + + 0.84 + - 0.67 + - 0.58 - - 0.51 + - 0.42 + - 0.16 - - 0.10 - - 0.07 - - TPR = 1/5 FPR = 0/4 P(Y) True class Predict class 0.94 + + 0.84 + + 0.67 + - 0.58 - - 0.51 + - 0.42 + - 0.16 - - 0.10 - - 0.07 - - TPR = 3/5 FPR = 0/4 P(Y) True class Predict class 0.94 + + 0.84 + + 0.67 + + 0.58 - - 0.51 + - 0.42 + - 0.16 - - 0.10 - - 0.07 - - TPR = 2/5 FPR = 1/4

ROC curves • Evaluates performance over varying costs and class distributions • Can summarize with area under the curve (AUC) • AUC of 0.5 is random • AUC of 1.0 is perfect 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FP Rate TP Rate Base RPT
Statistical questions in machine learning (Dietterich ’98) Single domain Multiple domains Analyze classifers Analyze algorithms Predict algorithm accuracy Choose between algorithms

