Chapter 8 – Logistic Regression © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce

Logistic Regression Extends idea of linear regression to situation where outcome variable is categorical Widely used, particularly where a structured model is useful to explain (= profiling ) or to predict We focus on binary classification i.e. Y =0 or Y =1
Why Not Linear Regression? Technically, you can run linear regression with a 0/1 response variable and obtain an output But the resulting model will not make sense For instance: Predictions will mostly not be 0 or 1 Coefficient interpretation will not make sense

The Logit Goal: Find a function of the predictor variables that relates them to a 0/1 outcome Instead of Y as outcome variable (like in linear regression), we use a function of Y called the logit Logit can be modeled as a linear function of the predictors The logit can be mapped back to a probability, which, in turn, can be mapped to a class
Step 1: Logistic Response Function p = probability of belonging to class 1 Need to relate p to predictors with a function that guarantees 0  p  1 Standard linear function (as shown below) does not: + … q = number of predictors

The Fix: use logistic response function Equation 8.2 in textbook
Step 2: The Odds eq. 8.3 eq. 8.4 p p Odds - = 1 The odds of an event are defined as: p = probability of event Odds Odds p + = 1 Or, given the odds of an event, the probability of the event can be computed by:

