Chap6_ThreeSimpleClassificationMethods-1

# Chap6_ThreeSimpleClassificationMethods-1 - Chapter 6 Three...

This preview shows pages 1–11. Sign up to view the full content.

Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
The three methods: Naïve rule Naïve Bayes K-nearest-neighbor Common characteristics: Data-driven, not model-driven Make no assumptions about the data
Naïve Rule Classify all records as the majority class Not a “real” method Introduced so it will serve as a benchmark against which to measure other results

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Naïve Bayes
Naïve Bayes: The Basic Idea For a given new record to be classified, find other records like it (i.e., same values for the predictors) What is the prevalent class among those records? Assign that class to your new record

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Usage Requires categorical variables Numerical variable must be binned and converted to categorical Can be used with very large data sets Example: Spell check – computer attempts to assign your misspelled word to an established “class” (i.e., correctly spelled word)
Exact Bayes Classifier Relies on finding other records that share same predictor values as record-to-be-classified. Want to find “probability of belonging to class C , given specified values of predictors.” Even with large data sets, may be hard to find other records that exactly match your record, in terms of predictor values.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Solution – Naïve Bayes Assume independence of predictor variables (within each class) Use multiplication rule Find same probability that record belongs to class C, given predictor values, without limiting calculation to records that share all those same values
Example: Financial Fraud Target variable: Audit finds fraud, no fraud Predictors: Prior pending legal charges (yes/no) Size of firm (small/large)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Charges? Size
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 11/09/2011 for the course MAR 08 taught by Professor Staff during the Spring '08 term at Youngstown State University.

### Page1 / 31

Chap6_ThreeSimpleClassificationMethods-1 - Chapter 6 Three...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online