Data Analysis II
Logistic and OLS Regression Compared
Logistic regression is an approach to prediction, like Ordinary Least Squares (OLS) regression.
However, with logistic regression, the researcher is predicting a dichotomous outcome.
situation poses problems for the assumptions of OLS that the error variances (residuals) are
Instead, they are more likely to follow a logistic distribution.
the logistic distribution, we need to make an algebraic conversion to arrive at our usual linear
regression equation (which we have written as
Y = B
X + e
With logistic regression, there is no standardized solution printed.
And to make things more
complicated, the unstandardized solution does not have the same straight-forward interpretation
as it does with OLS regression.
One other difference between OLS and logistic regression is that there is no R
to gauge the
variance accounted for in the overall model (at least not one that has been agreed upon by
Instead, a chi-square test is used to indicate how well the logistic regression
model fits the data.
Probability that Y = 1
Because the dependent variable is not a continuous one, the goal of logistic regression is a bit
different, because we are predicting the likelihood that
is equal to 1 (rather than 0) given
certain values of
That is, if
have a positive linear relationship, the probability that a
person will have a score of
= 1 will increase as values of
So, we are stuck with
thinking about predicting probabilities rather than the scores of dependent variable.
For example, we might try to predict whether or not small businesses will succeed or fail based
on the number of years of experience the owner has in the field prior to starting the business.
We presume that those people who have been selling widgets for many years who open their
own widget business will be more likely to succeed.
That means that as
(the number of years
of experience) increases, the probability that
will be equal to 1 (success in the new widget
business) will tend to increase.
If we take a hypothetical example, in which there were 50 small
businesses studied and the owners have a range of years of experience from 0 to 20 years, we
could represent this tendency to increase the probability that
=1 with a graph.
this, it is convenient to break years of experience up into categories (i.e., 0-4, 5-8, 9-12, 13-16,
17-20), but logistic regression does not require this.
If we compute the mean score on Y (averaging the 0s and 1s) for each category of years of
experience, we will get something like:
Probability that Y=1