This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Techniques II Page 68 Enter Logistic Regression.
The big, nonlinear Logistic model can be much simplified if we know that b1=1 and that b2=0.
This is basically what logistic regression does, it fits a logistic curve that goes from 0 to 1
asymptotically (so it never really reaches either one). The result of this simplification is a
simple linear regression. The dependent variable is a transformation of the original
probabilities called the “log odds”.
What are exactly are “odds”?
Odds are an expression of the likelihood of some event happens compared to the likelihood that
it does not happen. If the odds on a horse in a race are 30 to 1, is that horse likely to win? Or
lose? If the chance of an event happening are 50:50, what does that mean? What if the
likelihood of two events are 1:1? How is that different?
We will work with a number that is in some ways a little simpler. We will define odds as the
ratio of the probability that an event occurs and the probability of the same event does not
occur. Since the probabilities add to 1, if the probability of an event is “p”, then the probability
of not occurring is “1–p”.
So “50:50” has an odds of 1.0, and the 1:1 are also 1.0. They have the same odds. If
the odds are 1, then the likelihood of something happening is exactly equal to the
likelihood that it will not happen. That is p = 1 – p.
To simplify our concepts we will think of the odds as the ratio of two probabilities. The
probability that some event happens (success) will be equal to p. The probability of
failure will be 1–p. The odds are calculated as p/(1–p).
What if the odds are 2? This means that p is twice as large as (1–p), so success is twice
as likely as failure. If the odds are 10 the probability of success is 10 times more likely
If the odds are 0.5 or 0.1, then the probability of failure is twice as likely as or ten times
more likely than the probability of success.
So we start with probabilities of a binary event. We calculate the odds of one of the events by
dividing the probability of that event by the probability of the alternative event. Finally, the
natural logarithm of this ratio is calculated to produce the dependent variable of a linear the
model (which may be simple linear or multiple).
Predicted values from this model are “log odds”. An antilog is taken (elog odds) to get the
predicted odds. The odds can be transformed back into probabilities of the event by
calculating “probability of event = predicted odds / (1 + predicted odds)”.
Let’s look at an example.
What are the odds of getting an A in this course given a particular score on the first exam?
Disclaimer 1: The use of this example calculated from past grades in no way implies any
promise or commitment about the distribution of future grades.
Disclaimer 2: Although I will discuss only the number of A's, the non-A's are not all B's, there
have been C's and D's. Sorry.
Now, what are the odds of getting an A in this course?
I have had 423 students take the course since I have been giving two exams in this course
(previously 3). Of those 423, there have been 212 A's, we will call this “success”. The
odds of getting an A then are almost 50:50, and the odds are just about 1.
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 69 Interesting, but we can carry this one step further. What are the odds of getting an A if you
had a 70 on the first exam? Or an 80? Or a 90?
We will do Logistic regression to determine this. We will use SAS PROC Logistic. The
structure is very similar to PROC REG.
See the PROC LOGISTIC output (Appendix 10) The output is rather different because the analysis is not a least squares regression. You will be
responsible only for interpreting the values specified below.
Note that SAS states that the link is a LOGIT.
A logit is the natural logarithm of the odds. That is “loge(p / (1–p))”.
SAS gives the number of each of the binary cases, in our example there are 212 “TRUE” and 211
The output contains a section called the Model Fit Statistics (AIC, SC and –2LogL).
These statistics are used to compare 2 models. We will not cover them here.
Other output of interest is the “Testing Global Null Hypothesis: BETA=0”. This is like the test of
the model in ordinary regression.
Of special interest is the “Likelihood Ratio Test”, not so much the “Score” and “Wald” tests
The Parameter Estimates and their test are given in the section called “Analysis of Maximum
Do how do we interpret this output?
We are interested primarily in the slope and intercept, and the test of the slope.
Intercept = –16.91 and Slope = 0.1952
Likelihood ratio test P-value <0.0001
So we have a highly significant slope. The interpretation here is the same as for regression,
though the test statistic is different.
Now, we have a slope and an intercept. Could we get predicted values as with regression?
Absolutely! However, we entered log-odds into the model, so the predicted values are log-odds.
We can get these predicted values and convert to probabilities.
Calculate the probability that a person with an 83 will get an A in the course.
Yhati = –16.91 + 0.1952Xi = –16.91+ 0.1952(83) = –0.7086
This is a log odds, first take the antilog. exp(–
0.7086) = 0.4923
So, p/(1–p) = 0.4923, then solving for p we get p
= 0.4923/(1+ 0.4923) = 0.3299.
The probability of getting an A with a grade of 83
on the first exam is 0.33, or we could figure that
about 33% of students with a grade of 83 on the
first exam will get an A. The predicted values,
detransformed to probabilities, will form a
logistic regression line going from 0 to 1. 1.00
Grade on first exam 100 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 70 Interpretation of the slope. The analysis is log transformed, so to get the slope back on the original scale we need to take an
antilog (e.g. exp(0.1952) = 1.2156).
As with other slopes, this is the change in Y per unit X, or the change in the odds for an increase
of one point in the grade.
So the odds go up by 21 percent for each additional point in grade, but Don't confuse the increase
in odds with the increase in probability. Also, don't confuse the change in the odds (odds ratio)
with the odds . The increase of 1.2156 is an odds ratio, and gives the increase in the odds (per unit
change in Xi), not the ratio of the increase in the probability (see the calculations below).
0.70120 Ratio of values of P Odds ratio 1.2058 1.21556 1.1617 1.21556 1.0644 1.21556 So the slope is an odds ratio, the proportional change in the odds (per unit change in Xi).
SAS provides tests and confidence intervals for this value.
The analysis as mentioned is NOT a least squares regression, it is actually a weighted
maximum likelihood estimation carried out on the logit values.
Multiple logistic regression is perfectly feasible, and SAS also has the stepwise analyses and “best
models” selection you are familiar with. The best model selection is not called RSquare, it is the
SCORE option (based on chi square, not RSquare).
CLASS variables can be included in the analysis. We will discuss these in ANOVA.
Summary Regression on an indicator variable is similar in concept to ordinary least squares regression, but
differs considerably in the execution. The analysis is called Logistic Regression.
The slope and intercept are used in regression in a way that is similar to ordinary least squares, but
the value predicted is a log of the odds. This can be converted to probabilities.
SAS provides statistics to evaluate the significance of the fit, and a confidence interval for the
estimate of the slope.
SAS also provides other options for various selection techniques and the addition of class
variables. Statistics quote: If at first it doesn't fit, fit, fit again. -- John McPhee James P. Geaghan - Copyright 2011 Statistical Techniques II Page 71 Analysis of Covariance
Linear regression is usually done as a least squares technique applied to QUANTITATIVE
VARIABLES. ANOVA is the analysis of categorical (class, indicator, group) variables, there
are no quantitative “X” variables as in regression, but this is still a least squares technique.
It stands to reason that if Regression uses the least squares technique to fit quantitative variables,
and ANOVA uses the same approach to fit qualitative variables, that we should be able to put both
together into a single analysis.
We will call this Analysis of Covariance. There are actually two conceptual approaches, Multisource regression – adding class variables to a regression
Analysis of Covariance – adding quantitative variables to an ANOVA For now, we will be primarily concerned with Multisource regression.
With multisource regression we start with a regression (one slope and one intercept) and ask,
would the addition of an indicator or class variable improve the model?
Adding a class variable to a regression gives each group its own intercept, fitting a separate
intercept to each group. Y X
Adding an interaction fits a separate slope to each group. Y X
How do they do that?
For a simple linear regression we start with, Yi b 0 b1 X 1 i e i
Now add an indicator variable. An indicator variable, or dummy variable, is a variable that
uses values of “0” and “1” to distinguish between the members of a group. For example, if the
category, or groups, were MALE and FEMALE, we could give the females a “1” and the males
a “0” and distinguish between the groups. If the groups were FRESMAN, SOPHOMORE,
JUNIOR and SENIOR we would need 3 variables to distinguish between them, the first would
have a “1” for freshmen and a “0” otherwise, the second and third would have “1” for
SOPHOMORE and JUNIOR, respectively, and a zero otherwise. We don’t need a fourth
variable for SENIOR because if we have an observation with values of 0, 0, 0 for the three
variables we know it has to be a SENIOR. So, we always need one less dummy variable than
there are categories in the group. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 72 Fitting separate intercepts In our example we will add just one indicator variable, but it could be several. We will call our
indicator variable X2i, but it is actually just a variable with values of 0 or 1 that distinguishes
between the two categories of the indicator variable.
Yi b 0 b1 X 1 i b 2 X 2 i e i When X2i = 0 we get: Yi b0 b1 X 1 i b2 (0 ) ei ,
which reduces to Yi b 0 b1 X 1 i e i , a simple
linear model for the “0” group.
And when X2i = 1 we have Yi b 0 b1 X 1 i b 2 (1) e i ,
which simplifies to Yi ( b 0 b 2 ) b1 X 1 i e i , Y
b0 X a simple linear model with an intercept equal to (b0+b2) for the “1” group.
So there are two lines with intercepts of b0 and (b0+b2). Note that b2 is the difference between
the two lines, or an adjustment to the first line that gives the second line.
This can be positive or negative, so the second line may be above or below the first.
This term can be tested against zero to determine if there is a difference in the intercepts.
Fitting separate slopes Adding an interaction (crossproduct term) between the quantitative variable (X1i) and the
indicator variable (X2i) will fit separate slopes. Still using just one indicator variable for two
classes the model is Y i b 0 b1 X 1 i b 2 X 2 i b3 X 1 i X 2 i e i .
When X2i = 0 we get: Y i b 0 b1 X 1 i b 2 (0 ) b 3 X 1 i (0 ) e i , which reduces to Yi b 0 b1 X 1 i e i , a simple
linear model for the “0” group.
When X2i =1 then Yi b 0 b1 X 1 i b 2 (1) b3 X 1 i (1) e i Y
X simplifying to Yi ( b0 b 2 ) ( b1 b3 ) X 1i e i ,
which is a simple linear model with an intercept equal to (b0+b2) and a slope equal to
(b1+b3) for the “1” group. Note that b0 and b1 are the intercept and slope for one of the lines (whichever was assigned the
0). The values of b2 and b3 are the intercept and slope adjustments (+ or – differences) for the
If these adjustments are not different from zero then the intercept and or slope are not different
from each other.
A third or fourth line could be fitted by adding additional indicator variables and interaction
with coefficients b4 and b5, etc. Statistics quote: There are two kinds of statistics, the kind you look up and the kind you make up. -- Rex Stout
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 73 Conceptual application of Analysis of Covariance Suppose we have a regression, and we would like to know if perhaps the regression is different for
some groups, perhaps male and female or treated and untreated.
Of course, regression starts with a corrected sum of squares (a flat line through the mean) and we
fit a slope. Viewing this as extra sums of squares we are comparing a model with no slope to a
model with a slope. a A test of the difference between these two models tests the
hypothesis of adding a slope (H0:1=0). The extra SS are SSX1|X0, or just SSX1.
From the simple linear regression we want to test the hypotheses that the model is improved by
adding separate intercepts or slopes. The concepts are pretty simple. The additional variables can
be tested with extra SS or the General Linear Hypothesis Test (GLHT with full model and reduced
For a single indicator variable the extra SS are easy enough. For several indicator variables it may
be easier to do the GLHT where the single line is the reduced model and the two lines are the full
model. b A test of the difference between these two models tests the
hypothesis of adding an intercept adjustment (H0:2=0) or tests for the difference between the
intercepts. The extra SS are SSX2 | X1.
The next test compares a model with two intercepts and one slope with a model containing two
slopes and two intercepts. c Here we are testing the addition of a slope adjustment, or
second slope (H0:3=0). This test determines if a model with 2 intercepts and 2 slopes is better
than a model with 2 intercepts and one slope. The extra SS are SSX1X2 | X1 X2.
This series a–b–c is the usual “Multisource regression” series. Note that the extra SS needed are:
SSX1; SSX2 | X1; SSX1X2 | X1, X2. These are
the Type I SS for the SAS model Y=X1 X2
X1*X2;. So this is another of those rare cases
where we may wish to use TYPE I tests.
The hypotheses to be tested can be simplified to
some extent if we know that for our base model
we want a regression. This is multisource
Start with a simple linear regression.
Test to see if separate intercepts improve the
model (H0:2=0). Correction factor,
1 level = b0 (fit mean)
1 level (b0), 1 slope (b1)
2 levels (b0, b2), 1 slope (b1)
2 levels (b0, b2) and
2 slopes (b1 and b3) James P. Geaghan - Copyright 2011 Statistical Techniques II Page 74 Test to see if separate slopes improve the model (H0:3=0).
As extra SS we could test extra SS for b2 and then the extra SS for b3.
In practice we usually start with the fullest model (2 slopes and 2 intercepts)
and then reduce to 1 slope and 2 intercepts if the extra SS for b3 is not significant
and then to 1 slope and 1 intercept if extra SS for b2 is not significant.
So, interpretation usually proceeds from the bottom up; do we need 2 slopes and 2 intercepts, if
not do we need 2 intercepts, if not is the SLR significant?
If TYPE I sums of squares are used there would be little if any change from deleting the higher
order components of the model, so in this case we would not have to refit the model after each
The solution (i.e. regression coefficients) provided with ANCOVA is the correct model for
prediction, even though it is fully adjusted (Type III) and our tests of hypothesis are not (Type
The progression discussed above corresponds to the likely series in multisource regression (a-b-c
below). In Analysis of Variance (our next topic), the researchers interest is in differences among
the means of categorical variables. For this analysis the categorical variable can be put first and
the progression is the usual series for Analysis of Variance (d-e-c below). Which series of test you
need depends on if you are starting with a regression or an analysis of variance. a d
e b f c
Examine the possible full and reduced models on the previous page. What is tested in each case,
what extra SS is needed for each test and what are the initial and final models for each case.
One other test of occasional interest is the test denoted by the comparison “g”. If you fit a model
with two slopes and two intercepts, and cannot reduce to a model with two intercepts and one
slope, you may wonder if a model with two slopes and one intercept is appropriate.
Full model: Yi b 0 b1 X 1 i b 2 X 2 i b3 X 1 i X 2 i e i
Reduced model: Yi b0 b1 X 1i b3 X 1i X 2 i ei
The extra SS for this test is SSX2 | X1, X1*X2. This is actually a test of “intercepts fitted last”
and would be available as the Type III SS for X1*X2 with the model Y=X1 X2 X1*X2;.
g James P. Geaghan - Copyright 2011 ...
View Full Document
This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.
- Fall '08