Multiple Regression Model
• General Multiple Regression Model
.
•
k
regressor variables:
X
1
i
, ……,
X
ki
•
k
“slope” coefficients (parameters):
β
1
,….…,
β
k
• Each “slope” coefficient
β
j
measures the effect of a one
unit change in the corresponding regressor
X
ji
,
holding
all else (e.g. the other regressors) constant
.
•
u
i
– still omitted variables (but hopefully there are less
in here since we are including more regressors!)
01
1
2
2
=
......
ii
i
k
k
i
i
YX
X
X
u
ββ
β
++
+
+
+
Multiple Regression Assumptions  I
• As in the simple regression model, we need to make some
assumptions in order to estimate the coefficients
β
0
,
β
1
,….…,
β
k
.
The first 3 are very similar to our previous set of assumptions.
•
A1)
Cov
(
u
i
,
X
ji
) = 0 for every
j
(i.e.
u
i
is uncorrelated with each
of the
k
regressors)
or
A1b) E[
u
i

X
1i
= c
1
,
X
2i
= c
2
,.....
,
X
ki
= c
k
] = E[
u
i

X
1i
,
X
2i
,….,
X
ki
] = 0
(i.e. the expectation of
u
i
is zero regardless of the values of the
k
regressors.)
(Note the minor notational change in Assumption 1b) )
Multiple Regression Assumptions II
•
A2) (
X
1i
,
X
2i
,….,
X
ki
,
Y
i
) are i.i.d.
(again, this is true with random
sampling)
•
A3) (
X
1i
,
X
2i
,….,
X
ki
,
Y
i
) have finite fourth moments
(again, this
is generally true in economic data).
• We also need a fourth assumption in the multiple regression
model.
This fourth assumption addresses how the various
X
ji
’s
are related to each other.
Multiple Regression Assumptions III
•
A4) The regressors (
X
1i
,
X
2i
,….,
X
ki
) are not
perfectly
multicollinear.
This means that none of the regressors can
be written as a
perfect
linear function of
only
the other
regressors.
For example:
–I
f
X
2i
= 11 + 7
X
5i
+
X
4i
3
X
9i
+ 5.5
X
3i
, then A4) is violated
f
X
2i
= 11 + 7
X
5i
+
X
4i
X
9i
+ 5.5
X
3i
+ W
i
(where W
i
is some other
variable that is not one of the
X
ji
’s), then A4) is not violated
• Assumption 4) is rarely violated in practice, and when it is, it
is typically by accident. We will discuss A4) in more detail
momentarily.
Estimation  I
• Under A1) – A4), the OLS estimators (
β
0
,
β
1
,….…,
β
k
), which
minimize:
are
unbiased
and
consistent
estimators of the parameters
(
β
0
,
β
1
,….…,
β
k
).
Moreover, the CLT implies that for each
j
,
• The formulas for the OLS estimators (and their standard errors)
are too complicated to write down (unless one uses matrix
notation
☺
), but in STATA the estimates (and their SEs) can be
computed with the command (e.g. with 3 regressors):
“regress y x1 x2 x3”
or
“regress y x1 x2 x3, robust”
()
2
1
1
ˆˆ
ˆ
.......
n
k
k
i
i
X
=
⎡
⎤
−+
+
+
⎣
⎦
∑
ˆ
,Var
and
0,1
ˆ
jj
jjj
j
NN
SE
βββ
−
∼∼
Estimation  II
• Predicted (expected) values and residuals are the same
as before, i.e.