This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Techniques II Page 65 We fit the Reduced model and the Full model (as two separate models).
Full model results: dfError = 524 + 960 = 1484, SSE = 304949 + 799234 = 1104183
Reduced model results: dfError=1487, SSE=1206036.845
Then we set up the table below to test the difference.
Source Reduced model Full model Difference Full model d.f. 1487 1484 3 1484 SSE
MSE
1206036.85 1104183.01 101853.84 33951.2791
1104183.01
744.0586 F 45.6298 P>F 3.320764E‐28 In this case we would decide that there was clearly a difference between genders. We don’t know
which one or more of the 3 parameters is different (different curvature or different intercepts) but
some difference exists. Later, with analysis of covariance, we could determing which parameters
differ. It actually turns out to be the intercept, the curvatures are the same for both genders.
So, given the curvature, there is an intermediate age that runs the 10 K race fastest, and younger and
older individuals take longer. What is that age?
The fitted model for females is Time = 270.94 – 1.7668Age + 0.02906Age2
If we take the first derivative and set this equal to zero, and solve for Age we get:
Age at minimum time = 1.7668 / 2(0.02906) = 30.4
Using the equation to solve for the average time at age = 30.4 we get a mean of 244 minutes for
women, the best average time for any age.
The fitted model for males is Time = 265.60 – 2.3003Age + 0.03392Age2
Men had a minimum at 33.9 and had a mean time of 226.6 minutes at that age.
Do the results seem worthwhile? Are they meaningful? Are they Interpretable? Do they have
value? Note that the R2 for females was only 0.026822 and for males only 0.038138. However,
linear and quadratic terms were significant indicating that there is a significant fit to the means.
Polynomial Regression Summary Polynomial regressions are treated like any other multiple regression, except that we use Type I
SS for testing hypotheses.
Note that the FULLY ADJUSTED regression coefficients are still used to fit the model.
The ability to determine a minimum or maximum point is a useful application of polynomials
(optimum performance @ age, optimum yield @ fertilizer level, etc).
We have some new capabilities as far as what we can do with regression.
Test for a curvilinear relationship between the Y and X.
Test if the curvature is Quadratic? Cubic? Quartic? ...
We can now obtain a curvilinear predictive equation for Y on X. James P. Geaghan  Copyright 2011 Statistical Techniques II Page 66 Response surfaces
Polynomials are of interest as an extremely flexible method of curve fitting, even though there are
some severe restrictions on the interpretation and predictive ability (outside range) of the model.
The ones we have looked at employed only one independent variable. Can you have several?
YES, this is a “response surface”.
For example, take 2 independent variables. We would include not only quadratic terms (and
maybe cubic, etc), but also INTERACTIONS.
2
Yi b0 b1 X 1i b2 X 12i b3 X 2i b4 X 2i b5 X 1i X 2 i ei Shape varies with size and sign of the coefficients (bi) both concave produces a “bowl” shape
both convex yields a “hill” shape
one of each gives a “saddle” shape The interaction term allows for “twisting” of the surface.
Examples
50 A 40
30
20
10
0
1 3 X1 5 7 92 6 18
14
10 X2 45
40
35
30
25
20
15
10
5
0 50 B 40
30
20 C 10
1 3 X1 5 7 92 6 18
14
10 X2 13 57 18
1014
9 6
2 0 A = Both dimensions Concave and Symmetric (no interaction)
B = Both dimensions Concave and Asymmetric (interaction present); surface appears “twisted” C = One dimension Concave and one convex and Asymmetric (interaction present)
Larger models can be fitted but are not common.
2
3
2
Yi b0 b1 X 1i b2 X 12i b3 X 2i b4 X 2 i b5 X 1i X 2 i b6 X 13i b7 X 2 i b8 X 1i X 2 i b9 X 12i X 2 i ei Statistics quote: I always find that statistics are hard to swallow and impossible to digest. The only one I can
remember is that if all the people who go to sleep in church were laid end to end they would be
a lot more comfortable.  Mrs. Robert A. Taft
James P. Geaghan  Copyright 2011 Statistical Techniques II Page 67 Regression on an indicator variable
What is an indicator variable? It is a variable with either the value 0 or 1. When we get to
ANOVA we will see that class variables (categorical, group) are usually reparameterized in
parametric analyses as the value 0 or 1.
With only two levels (TrueFalse, UpDown, MaleFemale, MarkedUnmarked, HeadsTails)
the values are easily recoded.
If there are more levels (say t levels) then we need t–1 indicator variables.
But indicator variables are usually independent variables. ANOVA is all about indicator variables
as independent variables.
Regression on an indicator variable is different
Basically it is a simple linear regression where the dependent variable has a value of either 0 or
1. This is called a binary response.
1 0 This is a “primitive” version of regression on an indicator variable.
The predicted value (Yhat) is interpreted as the probability of getting a 1.
However, this line will go below zero and will go above 1. This makes the properties
somewhat undesirable.
The Logistic Model is a rather complicated model, it is not linear and cannot be fitted with PROC
REG or PROC GLM. b1 e . This equation has many applications
b1 b2 b3 Xi i
e
1
b1
and is often used as a “growth” model. The Logistic Growth Equation is Yi ( ) The Logistic Curve
100 1 80 Y 60
40
20
0
0 10 20 30 40 50 X 0 Wouldn't it be nice if we could fit this model to our indicator variable? James P. Geaghan  Copyright 2011 Statistical Techniques II Page 68 Enter Logistic Regression.
The big, nonlinear Logistic model can be much simplified if we know that b1=1 and that b2=0.
This is basically what logistic regression does, it fits a logistic curve that goes from 0 to 1
asymptotically (so it never really reaches either one). The result of this simplification is a
simple linear regression. The dependent variable is a transformation of the original
probabilities called the “log odds”.
What are exactly are “odds”?
Odds are an expression of the likelihood of some event happens compared to the likelihood that
it does not happen. If the odds on a horse in a race are 30 to 1, is that horse likely to win? Or
lose? If the chance of an event happening are 50:50, what does that mean? What if the
likelihood of two events are 1:1? How is that different?
We will work with a number that is in some ways a little simpler. We will define odds as the
ratio of the probability that an event occurs and the probability of the same event does not
occur. Since the probabilities add to 1, if the probability of an event is “p”, then the probability
of not occurring is “1–p”.
So “50:50” has an odds of 1.0, and the 1:1 are also 1.0. They have the same odds. If
the odds are 1, then the likelihood of something happening is exactly equal to the
likelihood that it will not happen. That is p = 1 – p.
To simplify our concepts we will think of the odds as the ratio of two probabilities. The
probability that some event happens (success) will be equal to p. The probability of
failure will be 1–p. The odds are calculated as p/(1–p).
What if the odds are 2? This means that p is twice as large as (1–p), so success is twice
as likely as failure. If the odds are 10 the probability of success is 10 times more likely
than failure.
If the odds are 0.5 or 0.1, then the probability of failure is twice as likely as or ten times
more likely than the probability of success.
So we start with probabilities of a binary event. We calculate the odds of one of the events by
dividing the probability of that event by the probability of the alternative event. Finally, the
natural logarithm of this ratio is calculated to produce the dependent variable of a linear the
model (which may be simple linear or multiple).
Predicted values from this model are “log odds”. An antilog is taken (elog odds) to get the
predicted odds. The odds can be transformed back into probabilities of the event by
calculating “probability of event = predicted odds / (1 + predicted odds)”.
Let’s look at an example.
What are the odds of getting an A in this course given a particular score on the first exam?
Disclaimer 1: The use of this example calculated from past grades in no way implies any
promise or commitment about the distribution of future grades.
Disclaimer 2: Although I will discuss only the number of A's, the nonA's are not all B's, there
have been C's and D's. Sorry.
Now, what are the odds of getting an A in this course?
I have had 423 students take the course since I have been giving two exams in this course
(previously 3). Of those 423, there have been 212 A's, we will call this “success”. The
odds of getting an A then are almost 50:50, and the odds are just about 1.
James P. Geaghan  Copyright 2011 Statistical Techniques II Page 69 Interesting, but we can carry this one step further. What are the odds of getting an A if you
had a 70 on the first exam? Or an 80? Or a 90?
We will do Logistic regression to determine this. We will use SAS PROC Logistic. The
structure is very similar to PROC REG.
See the PROC LOGISTIC output (Appendix 10) The output is rather different because the analysis is not a least squares regression. You will be
responsible only for interpreting the values specified below.
Note that SAS states that the link is a LOGIT.
A logit is the natural logarithm of the odds. That is “loge(p / (1–p))”.
SAS gives the number of each of the binary cases, in our example there are 212 “TRUE” and 211
“FALSE”.
The output contains a section called the Model Fit Statistics (AIC, SC and –2LogL).
These statistics are used to compare 2 models. We will not cover them here.
Other output of interest is the “Testing Global Null Hypothesis: BETA=0”. This is like the test of
the model in ordinary regression.
Of special interest is the “Likelihood Ratio Test”, not so much the “Score” and “Wald” tests
The Parameter Estimates and their test are given in the section called “Analysis of Maximum
Likelihood Estimates”
Do how do we interpret this output?
We are interested primarily in the slope and intercept, and the test of the slope.
Intercept = –16.91 and Slope = 0.1952
Likelihood ratio test Pvalue <0.0001
So we have a highly significant slope. The interpretation here is the same as for regression,
though the test statistic is different.
Now, we have a slope and an intercept. Could we get predicted values as with regression?
Absolutely! However, we entered logodds into the model, so the predicted values are logodds.
We can get these predicted values and convert to probabilities.
Calculate the probability that a person with an 83 will get an A in the course.
Yhati = –16.91 + 0.1952Xi = –16.91+ 0.1952(83) = –0.7086
This is a log odds, first take the antilog. exp(–
0.7086) = 0.4923
So, p/(1–p) = 0.4923, then solving for p we get p
= 0.4923/(1+ 0.4923) = 0.3299.
The probability of getting an A with a grade of 83
on the first exam is 0.33, or we could figure that
about 33% of students with a grade of 83 on the
first exam will get an A. The predicted values,
detransformed to probabilities, will form a
logistic regression line going from 0 to 1. 1.00
0.75
P(A)
0.50
0.25
0.00
50 60
70
80
90
Grade on first exam 100 James P. Geaghan  Copyright 2011 ...
View
Full
Document
This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.
 Fall '08
 Wang,J

Click to edit the document details