c
circlecopyrt
HYONJUNG KIM, 2017
2
Multiple Regression
While the straightline model serves as an adequate description for many situations, more
often than not, researchers who are engaged in model building consider more than just one
predictor variable
X
.
In fact, it is often the case that the researcher has a set of
p
−
1
candidate predictor variables, say,
X
1
, X
2
, ..., X
p
−
1
, and desires to model
Y
as a function of
one or more of these
p
−
1 variables. To accommodate this situation, we must extend our
linear regression model to handle more than one predictor variable.
MULTIPLE REGRESSION SETTING: Consider an experiment in which
n
observations are
collected on the response variable
Y
and
p
−
1 predictor variables
X
1
, X
2
, ..., X
p
−
1
.
Individual
Y
X
1
X
2
...
X
p

1
1
Y
1
X
11
X
12
...
X
1
p
−
1
2
Y
2
X
21
X
22
...
X
2
p
−
1
n
Y
n
X
n
1
X
n
2
...
X
np
−
1
To describe
Y
as a function of the
p
−
1 independent variables
X
1
, X
2
, ..., X
p
−
1
, we posit
the multiple linear regression model
Y
i
=
β
0
+
β
1
X
i
1
+
...
+
β
p
−
1
X
ip
−
1
+
ǫ
i
for
i
= 1
,
2
, ..., n
, where
n > p
and
ǫ
i
∼
N
(0
, σ
2
). The values
β
0
, β
1
, ..., β
p
−
1
are regression
coefficients as before and we assume that
X
1
, X
2
, ..., X
p
−
1
are all fixed. The random errors
ǫ
i
’s are still assumed to be independent and have a normal distribution with mean zero and
a common variance
σ
2
. Then,
Y
=
Xβ
+
ǫ
,
where
ǫ
∼
MVN (
0
, σ
2
I
).
In the multiple regression setting, because of the potentially large number of predictors, it
is more efficient to use matrices to define the regression model and the subsequent analyses.
PAGE 20