This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Handling Missing Data Data Mining Prof. Dawn Woodard School of ORIE Cornell University 1 Outline 1 Announcements 2 ROC Curves 3 ModelBased Classifiers 4 Handling Missing Data 5 Naive Bayes on the Heart Disease Data 6 Information for Lab 2 Announcements Questions? 4 ROC Curves Say we have a naive Bayes classifier with the following ROC curve on the test data: False Positive Rate True Positive Rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 thresh. = 0.6 thresh. = 0.5 thresh. = 0.4 6 ROC Curves Say false positive errors are much more costly than false negative errors for this problem. What threshold might you choose to use for prediction? What are the FPR and FNR at that threshold? False Positive Rate True Positive Rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 thresh. = 0.6 thresh. = 0.5 thresh. = 0.4 7 ROC Curves What threshold might you use if false positive errors are much less costly than false negative errors? False Positive Rate True Positive Rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 thresh. = 0.6 thresh. = 0.5 thresh. = 0.4 8 ROC Curves Give an example of a problem where false positive errors are much less costly than false negative errors . Give an example of a problem where false positive errors are much more costly than false negative errors. 9 Error Rates For a test data set having 10 records with Y = 1 and 15 records with Y = 0, what is the chance of correctly classifying all of them, as a function of the true positive rate t , the false positive rate f , the true negative rate g , and the false negative rate h ? For a test data set having 10 records with Y = 1 and 15 records with Y = 0, what is the chance that we correctly classify all of the records with Y = 1 and none of the records with Y = 0? 10 Error Rates Say the frequency of Y = 1 in the general population is 67 % . Say we select a random person and predict the value of Y for that person, based on their covariates. What is the probability that person has Y = 1 and we predict Y = 0, as a function of our true positive rate t , false positive rate f , true negative rate g , and false negative rate...
View Full
Document
 '08
 DAWN

Click to edit the document details