Chapter 10 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce

Discriminant Analysis: Background A classical statistical technique Used for classification long before data mining Classifying organisms into species Classifying skulls Fingerprint analysis And also used for business data mining ( loans, customer types, etc.) Can also be used to highlight aspects that distinguish classes ( profiling )
Small Example: Riding Mowers Goal : classify purchase behavior (buy/no-buy) of riding mowers based on income and lot size Outcome : owner or non-owner (0/1) Predictors : lot size, income

Can we manually draw a line that separates owners from non-owners?
Example: Loan Acceptance In the prior small example, separation is clear In data mining applications, there will be more records, more predictors, and less clear separation Consider Universal Bank example with only 2 predictors: Outcome: accept/don’t accept loan Predictors: Annual income (Income) Avg. monthly credit card spending (CCAvg)

Sample of 200 customers
5000 customers

Algorithm for Discriminant Analysis
The Idea To classify a new record, measure its distance from the center of each class Then, classify the record to the closest class

Step 1: Measuring Distance Need to measure each record’s distance from the center of each class The center of a class is called a centroid The centroid is simply a vector (list) of the means of each of the predictors. This mean is computed from all the records that belong to that class.
