Newsom
1
Data Analysis II
Fall 2008
Logistic Regression
Overview:
Logistic and OLS Regression Compared
Logistic regression is an approach to prediction, like Ordinary Least Squares (OLS) regression.
However, with logistic regression, the researcher is predicting a dichotomous outcome.
This
situation poses problems for the assumptions of OLS that the error variances (residuals) are
normally distributed.
Instead, they are more likely to follow a logistic distribution.
When using
the logistic distribution, we need to make an algebraic conversion to arrive at our usual linear
regression equation (which we have written as
Y = B
0
+ B
1
X + e
).
With logistic regression, there is no standardized solution printed.
And to make things more
complicated, the unstandardized solution does not have the same straight-forward interpretation
as it does with OLS regression.
One other difference between OLS and logistic regression is that there is no R
2
to gauge the
variance accounted for in the overall model (at least not one that has been agreed upon by
statisticians).
Instead, a chi-square test is used to indicate how well the logistic regression
model fits the data.
Probability that Y = 1
Because the dependent variable is not a continuous one, the goal of logistic regression is a bit
different, because we are predicting the likelihood that
Y
is equal to 1 (rather than 0) given
certain values of
X
.
That is, if
X
and
Y
have a positive linear relationship, the probability that a
person will have a score of
Y
= 1 will increase as values of
X
increase.
So, we are stuck with
thinking about predicting probabilities rather than the scores of dependent variable.
For example, we might try to predict whether or not small businesses will succeed or fail based
on the number of years of experience the owner has in the field prior to starting the business.
We presume that those people who have been selling widgets for many years who open their
own widget business will be more likely to succeed.
That means that as
X
(the number of years
of experience) increases, the probability that
Y
will be equal to 1 (success in the new widget
business) will tend to increase.
If we take a hypothetical example, in which there were 50 small
businesses studied and the owners have a range of years of experience from 0 to 20 years, we
could represent this tendency to increase the probability that
Y
=1 with a graph.
To illustrate
this, it is convenient to break years of experience up into categories (i.e., 0-4, 5-8, 9-12, 13-16,
17-20), but logistic regression does not require this.
If we compute the mean score on Y (averaging the 0s and 1s) for each category of years of
experience, we will get something like:
Yrs Exp
Average
Probability that Y=1
0-4
.17
.17
5-8
.40
.40
9-12
.50
.50
13-16
.56
.56
17-20
.96
.96