EXST 7015 - Statistical Inference II, Fall 2011
Lab 5: Multiple Linear Regression
OBJECTIVES
In SLR, only a single dependent variable can be regressed on a single independent variable. In
multiple regression however, a number of variables can be involved and regressed on one
another (model: Y =
β
0
+
β
1
X1+
β
2
X2+ · · · +
β
p
XP +
ε
) .The overall test of hypothesis of
multiple linear regression is H
0
:
β
1
=
β
2
= · · ·=
β
p
=0 v.s. H
1
: at least one
β
≠
0.
Rejection of H
0
implies that at least one of the regressors, X1, X2, . . . , Xp, contributes significantly to the model.
As in SLR, the F-test is used to test this hypothesis. The assumptions for the multiple regression
are the same for SLR. Thus the same sets of analysis, such as residual plot, normality test and
diagnostic statistics are used to evaluate the assumptions.
In this lab, we will use PROC GLM and PROC REG to perform multiple linear regression.You
are required to identify various types of sum-of-squares (TypeI, TypeII, TypeIII and TypeVI) by
using PROC GLM, and the components in X’X matrix (cross products X’X, X’Y, and Y’Y) and
(X’X)
-1
matrix (X’X inverse, parameters and SSE) by using PROC REG; to understand that F-
Test and T-test give the same results for parameter estimates test of hypothesis
In multiple regression, when two independent variables are highly correlated, the problem occurs
because X’X matrix could not be inverted. This problem is called multicollinearity, which could
cause large fluctuations of the regression coefficients and inflated variance estimates. Therefore,
the regression coefficient estimates are not useful. In this lab, you will also get familiar with the
statistics (sequential parameter estimates, variance inflation factor (VIF) and condition index),