5.6
Hypothesis testing in MLR
The null hypothesis for testing if a covariate is related to the response is
H
0
:
β
j
= 0
where we have the test statistic
t
=
ˆ
β
j

0
SE
(
ˆ
β
j
)
∼
t
n

(
p
+1)
under
H
0
.
Example 5.2.
For
β
1
(size), we have
t
=
31
.
26
21
.
47
= 1
.
46
From the tdistribution table for
n
= 18
, we see that this corresponds to a pvalue between
0
.
1
and
0
.
2
(
0
.
1625
to
be exact).
We therefore do not reject
H
0
(pvalue
>
0
.
05
) i.e. there is no significant relationship between overhead and size,
after accounting for the other variates.
11
Winter 2018
STAT 331 Course Notes
6
JANUARY 25, 2018
6
January 25, 2018
6.1
Scatter plot matrix
For a given set of explanatory variates and a response variate, we can plot a matrix of 2D scatter plots of each
variate against all the other variates.
Figure 6.1:
Size, employees, and clients are all correlated with overhead. Note however that size, employees, and
clients are all correlated with each other therefore it would probably suffice to only include one of these explanatory
variates without losing much information in our model.
From this matrix, we can visually see which explanatory variates are correlated to the explanatory variate but also
which explanatory variates are correlated with each other.
6.2
Multicollinearity
When strong (linear) relationships are present among two or more explanatory variates, we say the variates exhibit
multicollinearity
.
Intuitively, multicollinearity means some explanatory variates are dependent and it would not be required to
have all the extraneous dependent variates in model since they do not introduce much additional explained
variance/information.
In fact, multicollinear is
detrimental
: it leads to inflated variances of the associated parameter estimates (
(
X
T
X
)

1
has inflated diagonal entries, thus
SE
(
ˆ
β
j
) =
ˆ
σ
q
(
X
T
X
)

1
jj
is inflated), resulting in inaccurate conclusions from
hypothesis tests and confidence intervals (which depend on
SE
(
ˆ
β
j
)
) (intuitively, our estimate of the impact of
one unit change of
x
j
,
ˆ
β
j
, while controlling for the others tend to be less precise since there is some dependency
happening when “changing”
x
j
with another correlated
x
k
).
6.3
Variance inflation factor (VIF)
To assess whether a variate
x
j
is a problem in terms of multicollinearity, we can regress
x
j
onto all other explanatory
variates. We can then caclulate the
variance inflation factor
for
x
j
VIF
j
=
1
1

R
2
j
12
Winter 2018
STAT 331 Course Notes
7
JANUARY 30, 2018
The
VIF
j
can be interpreted as the factor by which the variance of
ˆ
β
j
is increased relative to the ideal case in which
all explanatory variates are uncorrelated (i.e. columns of
X
are orthogonal).
Example 6.1.
Suppose we do this for
x
j
=
x
3
: we regress the number of employees on all other explanatory
variates (see scatter plot matrix above).