This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: ROC Curves Data Mining Prof. Dawn Woodard School of ORIE Cornell University 1 Outline 1 Announcements 2 Changing the Prediction Threshold 2 Questions? Another reference text (has a section on naive Bayes): Shmueli, Patel, and Bruce (2007). Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner . Step-through debugging in R is provided by the debug function. For info, call help(debug) Define your function (e.g. myFun) Call debug(myFun) Run your function: myFun(args) 4 Lab What was your error rate from lab? Why is this a terrible error rate? What might be the cause of such a bad error rate? 5 Threshold In naive Bayes to predict Y we calculated Pr ( Y = Yes | X 1 , . . . , X K ) If this was greater than 0.5, we classified as Y = Yes. Why? Because Pr ( Y = Yes | X 1 , . . . , X K ) > . 5 means we believe that Y = Yes has higher probability than Y = No If Pr ( Y = Yes | X 1 , . . . , X K ) < = . 5 we classified as Y = No 7 Threshold Using thresholds other than 0.5 can sometimes reduce the overall error 8 Accidents Data Using threshold 0.5 for the accidents data our error rate was 0 . 405. Our classification table from the nb.predict function was: Prediction = 0 Prediction = 1 Actual = 0 399 116 Actual = 1 289 196 The nb.predict function does not label the columns Prediction and the rows Actual, so you will need to remember the meaning Here 0 means No and 1 means Yes 9 Accidents Data Prediction = 0 Prediction = 1 Actual = 0 399 116 Actual = 1 289 196 Here we are making two types of errors How many times did we predict Y = 0 when actually...
View Full Document