This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Techniques II Page 32 Multiple regression
Multiple regression involves two or more independent variables (Xi), but still only a single dependent
variable (Yi). There is an analysis for multiple dependent variables, it is called multivariate regression.
The sample equation in multiple regression is of the form
Yi b0 b1 X 1i b2 X 2i b3 X 3i ei
The objectives in multiple regression are generally the same as SLR.
Testing hypotheses about potential relationships (using correlations),
fitting and documenting relationships, and
estimating parameters with confidence intervals.
The good news, most of what we know about simple linear regressions applies to multiple regression.
The regressions equation is similar.
The assumptions for the regression are the same as for Simple Linear Regression
The interpretation of the parameter estimates are the same (units are Y units per X units, and
measure the change in Y for a 1 unit change in X).
The diagnostics used in simple linear regression are mostly the same for multiple regression.
Residuals can still be examined for outliers, homogeneity, normality, curvature, influence, etc.,
as with SLR. The only difference is that, since we have several X's, we would usually plot the
residuals on Yhat instead of a single X variable.
So what is different?
Obviously, the calculations are more complicated. Algebraic equations basically do not exist.
Matrix algebra must be used.
Also, we now have several independent variables, X1, X2, X3, etc. We will need some
mechanism to evaluate these individually. To this end, we will discuss a new type of Sum of
Squares not needed for simple linear regression.
The use of several independent variables also creates some new problems. If the independent
variable are highly correlated we have a problem called multicollinearity. We will need some
diagnostics to evaluate this problem.
Outside of the new diagnostics needed to deal with several independent variables, SLR and
Multiple regression are very similar.
To do multiple reg. in SAS we specify a model with the variables of interest.
For example, a regression on Y with 3 variables X1, X2 and X3 would be specified as
PROC REG;
MODEL Y = X1 X2 X3;
To get the SS Type 1 and SS Type 2 we add the options “/ ss1 ss2;” James P. Geaghan  Copyright 2011 Statistical Techniques II Page 33 Extra Sum of Squares When a variable is added to a model, it usually accounts for some variation. In rare circumstances
the variable will account for zero sum of squares. This is rare in practice. Most often, when a
variable is added to a model it reduces the error sum of squares (SSE) and increases the model
sum of squares (SSReg). The sum of squares that a variable causes to be removed from the error
SS and added to the model SS is called its “ExtraSS”.
If each variable had its unique ExtraSS, the concept would be simple. However, two variables in
a model are rarely wholly independent. Two variables in a model may “compete” for SS, so that if
one enters the model first it gets SS that a second variable might have taken had it entered first.
Or, one variable may actually enhance another, so that the second variable may actually account
for more SS after the first variable is entered than if it had entered first.
As a result we cannot talk simply about the ExtraSS for a variable. We have to consider the
ExtraSS in the context of what variables were already in the model.
Extra SS notation. The ExtraSS will simply be denoted SSXi for each variable Xi.
Since we must consider which variables are already in the model we will add to this notation a
designation of what is already in the model. For example, the extra SS for X2, given that X1 and
X3 are already in the model will be SSX2  X1, X3.
We could also add a designation for the intercept (usually X0). So the extra SS mentioned earlier
could be SSX2  X0, X1, X3. However, since ALL models we will discuss are adjusted for the
intercept first, we will usually leave off the X0.
Using this notation, a simple linear regression Extra SS for X would be just SSX. Adjustment for
X0 is assumed.
Extra SS Example 1 The first example is with a two factor multiple regression. The data is given below.
We will fit the simple linear regressions for each variable, and then the two factor multiple
regression to calculate the extra ss.
In SAS we simply list the variables in the PROC REG model statement, MODEL Y = X1 X2;
Obs
1
2
3
4
5
6
7
8
9
10 Y
18
22
34
36
42
54
68
77
87
92 X1
3
1
2
6
8
6
7
4
9
6 X2
4
8
11
5
1
5
10
10
11
8 James P. Geaghan  Copyright 2011 Statistical Techniques II Page 34 Simple linear regression for Y on X1.
Analysis of Variance Table
Source
Model
Error
C Total DF
1
8
9 Sum of
Squares
2381.31494
4054.68506
6436.00000 Mean
Square
2381.31494
506.83563 F Value
4.698 Prob>F
0.0621 Note the slope for the variable X1 is not significantly different from zero.
Analysis of Variance Table (Y on X2)
Source
Model
Error
C Total DF
1
8
9 Sum of
Squares
1446.14793
4989.85207
6436.00000 Mean
Square
1446.14793
623.73151 F Value
2.319 Prob>F
0.1663 The slope for X2 also is not significantly different from zero either.
Analysis of Variance (Y on both X1 and X2)
Source
Model
Error
C Total DF
2
7
9 Sum of
Squares
4523.43473
1912.56527
6436.00000 Mean
Square
2261.71736
273.22361 F Value
8.278 Prob>F
0.0143 The two variables together are significant, but not individually.
Now \compare the results of the three fits, first calculating the Extra SS. Since the SSReg
increases by the same amount that the SSError decreases for each variable, either SS can be used.
I will use the Error SS. Error SS for X1 alone
Error SS for X2 alone
Error SS for X1 and X2
SSTotal (all models) 4054.68506
4989.85207
1912.56527
6436.00000 The first variable alone (X1) had an error SS equal to 4054.68506 out of the corrected total of
6436. The difference is 2381.31494, and this is the ExtraSS for X1, the amount of the total
accounted for by the variable.
SSX1 = 6436 – 4054.68506 = 2381.31494
Likewise, the ExtraSS for X2 alone is
SSX2 = 6436 – 4989.85207 = 1446.14793
Now, how did they do together in the multiple regression?
The two variables together had an error term of 1912.56527. Since the first variable alone had
an error of 4054.68506, the second must have reduced the model by an additional amount equal
to the difference. This is an additional amount of SSX2X1 = 4054.68506 – 1912.56527 =
2142.119793. And likewise, the ExtraSS for X1 is SSX1X2 = 4989.85207 – 1912.56527 =
3077.286793. James P. Geaghan  Copyright 2011 Statistical Techniques II Page 35 In summary, SSX1 = 2381.31494
SSX2 = 1446.14793
SSX1X2 =3077.286793
SSX2X1 = 2142.119793 In this case the variables actually enhanced each other, performing better together than alone.
Although not the rule, this is not unusual.
3 factor model The calculation of extra SS is exactly the same for larger models. The following example is a 3
factor multiple regression.
In SAS this model would be
PROC REG; MODEL Y = X1 X2 X3;
The raw data is given below.
Obs
1
2
3
4
5
6
7
8
9
10
11
12 Y
1
3
5
3
6
4
2
8
9
3
5
6 X1
2
4
7
3
5
3
2
6
7
8
7
9 X2
9
6
7
5
8
4
3
2
5
2
3
1 X3
2
5
9
5
9
2
6
1
3
4
7
4 The results of the regressions are;
SSTotal is 62.91667 for all models
For the 1 factor models the results are;
Regression of Y on X1: SSError = 38.939, SSModel = 23.978
Regression of Y on X2: SSError = 58.801, SSModel = 4.115
Regression of Y on X3: SSError = 62.680, SSModel = 0.237
The extra SS are equal to the model SS for 1 factor models. SSX1 = 23.978
SSX2 = 4.115
SSX3 = 0.237 (or SSX1X0)
(or SSX2X0)
(or SSX3X0) These SS are adjusted for the intercept (correction factor). This will always be the case for our
examples, so the X0 is often omitted.
Fitting X1 and X2 and X3 together TWO AT A TIME we get the following results.
James P. Geaghan  Copyright 2011 ...
View
Full
Document
This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.
 Fall '08
 Wang,J

Click to edit the document details