Dummy Variables in MR - VIII
•
We also may be interested in the effects of
multiple sets
of categorical
variables, e.g. the effects of both the sex of an individual,
and
their ethnicity.
In this case, there needs to be an “excluded” group for each set of categories,
e.g.
– Ethnicity: let’s choose “Other” as the excluded group
• Hispanic
i
= 1 if observation
i
is Hispanic, Hispanic
i
= 0 otherwise
•B
l
a
c
k
i
= 1 if observation
i
is Black, Black
i
= 0 otherwise
– Sex: let’s choose “Male” as the excluded group
•F
em
a
l
e
i
= 1 if observation
i
is Female, Female
i
= 0 otherwise
•
What do these regression results say about average wages for Hispanic
Females, Black Males, Other Males, etc………?
n
Wage = 17.40
6.08Hispanic
4.59Black
2.90Female
(1.02)
(1.10)
(1.04)
(0.97)
ii
i
i
−−
−
Hypothesis Testing in MR
• There are two types of hypothesis tests we can do in
multiple regression.
– 1) Hypothesis tests involving a single coefficient, e.g. a test
whether
β
5
=1, or a test whether
β
3
=0.
– 2) Hypothesis tests involving multiple coefficients, e.g. a test
whether
β
5
=2*
β
3
, or a test whether
β
1
=
β
2
=
β
4
=0.
These are
sometimes called
joint
hypothesis tests, because they test
conditions on multiple coefficients jointly, or together.
• Tests of the form 1) can be done using
t
STAT
’s, just like
in basic regression analysis.
• Tests of the form 2) will be done with a new statistic,
the F-statistic, or
F
STAT
.
Hypothesis Tests for a Single
Coefficient - I
• Recall that under A1) – A4), each estimated parameter
β
j
satisfies:
• Note: STATA reports SE(
β
j
) for each coefficient.
• Hence, we can test hypotheses regarding a single
β
j
in the same
way as before, i.e.
• 1) State hypotheses and choose significance level, e.g.
significance level = 0.05
H
0
:
β
j
= c
vs.
H
A
:
β
j
≠
c
()
ˆ
ˆˆ
,Var
and
0,1
ˆ
jj
jjj
j
NN
SE
ββ
βββ
β
−
∼∼
Hypothesis Tests for a Single
Coefficient - II
• 2) Compute
t
STAT
• 3) Compare
t
STAT
to the appropriate critical value.
Reject H
0
if
the absolute value of the
t
STAT
is greater than the critical value.
• Just as before:
– 1) We could alternatively do this test using the p-value associated with
the
t
STAT
– 2) STATA reports
t
STAT
’s and associated p-values for the tests that each
coefficient equals zero.
– 3) Confidence Intervals for
β
j
can be formed as before, e.g. a 99%
confidence interval is given by:
ˆ
ˆ
j
STAT
j
c
t
SE
−
=
(
)(
)
(
)
2.58
,
2.58
SE
SE
−+
Hypothesis Tests for a Single
Coefficient - III
•
Last note:
When using dummy variables, remember that we had to choose an
excluded group.
Again, with PS2 data we had:
•
Regression A (with “Other” as the excluded group)
•
Regression B (with “Hispanic” as the excluded group)
•
Again, these regressions are really
exactly
the same (they both say average wages for
Others is 16.08, average wages for Blacks is 11.57, and average wages for Hispanics
is 10.19)
•
But because the variables are coded differently, the coefficients measure different
aspects of the relationship, so the
t
STAT
’s on the coefficients test different things.