naiveBayes4_4perPage - Handling Missing Data Data Mining...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Handling Missing Data Data Mining Prof. Dawn Woodard School of ORIE Cornell University 1 Outline 1 Announcements 2 ROC Curves 3 Model-Based Classifiers 4 Handling Missing Data 5 Naive Bayes on the Heart Disease Data 6 Information for Lab 2 Announcements Questions? 4 ROC Curves Say we have a naive Bayes classifier with the following ROC curve on the test data: False Positive Rate True Positive Rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 thresh. = 0.6 thresh. = 0.5 thresh. = 0.4 6 ROC Curves Say false positive errors are much more costly than false negative errors for this problem. What threshold might you choose to use for prediction? What are the FPR and FNR at that threshold? False Positive Rate True Positive Rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 thresh. = 0.6 thresh. = 0.5 thresh. = 0.4 7 ROC Curves What threshold might you use if false positive errors are much less costly than false negative errors? False Positive Rate True Positive Rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 thresh. = 0.6 thresh. = 0.5 thresh. = 0.4 8 ROC Curves Give an example of a problem where false positive errors are much less costly than false negative errors . Give an example of a problem where false positive errors are much more costly than false negative errors. 9 Error Rates For a test data set having 10 records with Y = 1 and 15 records with Y = 0, what is the chance of correctly classifying all of them, as a function of the true positive rate t , the false positive rate f , the true negative rate g , and the false negative rate h ? For a test data set having 10 records with Y = 1 and 15 records with Y = 0, what is the chance that we correctly classify all of the records with Y = 1 and none of the records with Y = 0? 10 Error Rates Say the frequency of Y = 1 in the general population is 67 % . Say we select a random person and predict the value of Y for that person, based on their covariates. What is the probability that person has Y = 1 and we predict Y = 0, as a function of our true positive rate t , false positive rate f , true negative rate g , and false negative rate...
View Full Document

Page1 / 12

naiveBayes4_4perPage - Handling Missing Data Data Mining...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online