Unformatted text preview: Econ 3210: Assignment 3
Fall 2015 Due: Monday, November 30th.
Late assignments will not be accepted, as I will post the answers once the due date expires. You
may work and turn in assignments in groups. Names of group members and student IDs must be
on the submitted work (I will not allow you to add names after the assignment is submitted).
Groups can be no larger than four (4).
If you hand in your assignment electronically (which I recommend), please note the following:
1. I will accept ONE (1) .pdf ﬁle for the entire assignment.
2. Please don’t name your assignment ‘Assignment 3.pdf.’
Instead, name it ‘Assignment 3 LastName.pdf’ or use ﬁrst and last name if your last name
3. Use the subject ‘Econ 3210’ in the subject line.
1. Download and work through the R template for Assignment 3. Once this is done, do the
following: Based on the template that I’ve provided, change the code to download the dataset
crime1.dta. This dataset is introduced on page 83 of the text in example 3.5.
(a) Using R, estimate the ﬁrst speciﬁcation (without avgsen in example 3.5 and interpret
your regression results. How does employment in 1986 affect arrests? Is it statistically
signiﬁcant at the 5% level on a two-sided test.
lm.1 <- lm(narr86 ˜ pcnv + ptime86 + qemp86)
(b) The data contains information on demographic characteristics. Add to the model the variables black, hispan, born60, which are dummy variables for Black and Hispanic ethnicity, and being born after 1960. Describe the results for each of these variables. Are
they individually signiﬁcant?
(c) The R2 increases from the ﬁrst to the second speciﬁcation, which is not surprising. Does
this indicate that the demographic variables are jointly statistically signiﬁcant? Perform
an F-test to investigate this. What is the restricted and unrestricted model in this case?
(d) Add avgsen to each speciﬁcation from (a) and (b) (ie., estimate two more regression models with this variable added). If longer sentences have a deterrent effect, what should the
sign of the coefﬁcient on avgsen be? How do you interpret this coefﬁcient? Do you think
that there is a deterrent effect?
(e) Install the stargazer package by imputing the following code at the top of your script.
Output the results of your four speciﬁcations with the following code:
stargazer(lm.1, lm.2, lm.3, lm.4, type=‘‘text’’)
where lm.1 is the name of your ﬁrst model, and so on. 1. R output
-0.150*** -0.127*** -0.151*** -0.128***
ptime86 -0.034*** -0.040*** -0.037*** -0.041***
(0.009) qemp86 -0.104*** -0.095*** -0.103*** -0.095***
(0.010) black 0.342***
(0.045) hispan 0.203***
(0.040) born60 -0.038
(0.033) avgsen Constant 0.007
2. Higher employment is associated with a reduction of arrest. A one unit increase in the employment rate is associated with a reduction in arrests by 0.104. That is, being employed
for one quarter reduces the number of arrests by 0.104 holding other things constant. It
is highly signiﬁcant.
3. Being black or hispanic is associated with more arrests. In particular, being black is
associated with 0.34 more arrests, compared to the omitted group, holding other things
constant. Being hispanic is associated with 0.2 more arrests. Both are statistically significant. Being born after 1960 is associated with 0.038 less arrests, but this is not signiﬁcantly different than zero.
4. You can use the R2 to answer this. Or you can use R with the linearHypothesis command
as follows. The p-value is very low, indicating that we can reject this hypothesis.
black = 0
hispan = 0
born60 = 0 Model 1: restricted model
(iii) The intercept isis not very interesting as it gives qemp86 + nettfa for inchispanage = =
Model 2: narr86 ˜ pcnv + ptime86 + the predicted black +inc = 0 and ageborn60
(iii) The intercept not very interesting as it gives the predicted nettfa for = 0 and +
0.0.Clearly, there isis no one with even close to these values in the relevant population.
Clearly, there no one with even close to these values in the relevant population.
RSS Df Sum of Sq
(iv) The t statistic isis (.843 − 1)/.092 ≈ −1.71.Against the one-sided alternative HH1β2 < < 1,
1(iv) The t statistic (.843 − 1)/.092 ≈ −1.71. Against the one-sided alternative 1: : β2 1,
the p-value isis about .044.Therefore,48.676 reject H0β2 5.35e-15 5% significance level
we can reject H0: : 2 1 at the 5% significance level
2718 1878.6 3
the p-value about .044. Therefore, we can 23.475 β= = 1 at the ***
(against the one-sided alternative).
--(against the one-sided alternative).
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
(v) The slope coefficient on inc inin the simple regression is about .821, which is not very
(v) The slope coefficient on inc the simple regression is about .821, which is not very
5. avgsen the .799 obtained inin partcoefﬁcient if longer sentences have a deterrent effect. We
different fromshould have a negative(ii). As it it turns out, the correlation between inc and age
different from the .799 obtained part (ii). As turns out, the correlation between inc and age
inin dosample of singleis people is only about .039, which helps explain why the simple and
the not ﬁnd that peoplecase in about .039, which helps explain why the simple and
the sample of single the is only either of the speciﬁcations.
multiple regression estimates are not very different; refer back toto page 84 of the text.
multiple regression estimates are not very different; refer back page 84 of the text.
2. Computer Exercise C9 on page 166 from Wooldridge. C4.9 (i) The results from the OLS regression, with standard errors inin parentheses, are
C4.9 (i) The results from the OLS regression, with standard errors parentheses, are log( psoda ) = = −1.46 + +.073 prpblck + + .137 log(income) + + .380 prppov
log( psoda ) −1.46
n n = 401,R2 = = .087
= 401, R2 .087
The p-value for testing HH0: 1 = = 0 against the two-sided alternative is about .018, so that we
The p-value for testing 0: β β1 0 against the two-sided alternative is about .018, so that we
reject HHatat the 5% level but not at the 1% level.
reject 0 0 the 5% level but not at the 1% level.
(ii) The correlation isis about −.84, indicating a strong degree of multicollinearity.Yet each
(ii) The correlation about −.84, indicating a strong degree of multicollinearity. Yet each
coefficient isis very statistically significant: the statistic for β log(income ) isis about 5.1 and that for
coefficient very statistically significant: the t t statistic for ˆβ log(income ) about 5.1 and that for
isis about 2.86 (two-sided p-value = .004).
about 2.86 (two-sided p-value = .004).
prppov (iii) The OLS regression results when log(hseval) isis added are
(iii) The OLS regression results when log(hseval) added are log( psoda ) = = −.84 + + .098 prpblck − − .053 log(income)
log( psoda ) −.84
+ +.052 prppov + + .121 log(hseval)
n n = 401,R2 = = .184
= 401, R2 .184
The coefficient on log(hseval) isis an elasticity: a one percent increase in housing value, holding
The coefficient on log(hseval) an elasticity: a one percent increase in housing value, holding
the other variables fixed, increases the predicted price by about .12 percent. The two-sided pthe other variables fixed, increases the predicted price by about .12 percent. The two-sided pvalue isis zero to three decimal places.
value zero to three decimal places.
(iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the
(iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the
15% significance level against a a two-sided alternative for log(income), and prppov is does not
15% significance level against two-sided alternative for log(income), and prppov is does not
have a a statistic even close toto one in absolute value).Nevertheless, they are jointly significant atat
have t t statistic even close one in absolute value). Nevertheless, they are jointly significant
the 5% level because the outcome of the F2,396 statistic isis about 3.52 with p-value = .030.All of
the 5% level because the outcome of the F2,396 statistic about 3.52 with p-value = .030. All of
45 the control variables – log(income), prppov, and log(hseval) – are highly correlated, so it is not
surprising that some are individually insignificant.
(v) Because the regression in (iii) contains the most controls, log(hseval) is individually
significant, and log(income) and prppov are jointly significant, (iii) seems the most reliable. It
holds fixed three measure of income and affluence. Therefore, a reasonable estimate is that if the
proportion of blacks increases by .10, psoda is estimated to increase by 1%, other factors held
C4.10 (i) Using the 1,848 observations, the simple regression estimate of βbs is about −.795 .
The 95% confidence interval runs from −1.088 to − .502 , which includes −1. Therefore, at the
5% level, we cannot reject that H 0 : βbs = −1 against the two-sided alternative.
(ii) When lenrol and lstaff are added to the regression, the coefficient on bs becomes about
−.605; it is now statistically different from one, as the 95% CI is from about −.818 to −.392. The
situation is very similar to that in Table 4.1, where the simple regression estimate is −.825 and
the multiple regression estimate (with the logs of enrollment and staff included) is −.605. (It is a
coincidence that the two multiple regression estimates are the same, as the data set in Table 4.1 is
for an earlier year at the high school level.)
(iii) The standard error of the simple regression estimate is about .150, and that for the
multiple regression estimate is about .109. When we add extra explanatory variables, two factors
work in opposite directions on the standard errors. Multicollinearity – in this case, correlation
between bs and the two variables lenrol and lstaff works to increase the multiple regression
standard error. Working to reduce the standard error of βbs is the smaller error variance when
lenrol and lstaff are included in the regression; in effect, they are taken out of the simple
regression error term. In this particular example, the multicollinearity is modest compared with
the reduction in the error variance. In fact, the standard error of the regression goes from .231 for
simple regression to .168 in the multiple regression. (Another way to summarize the drop in the
error variance is to note that the R-squared goes from a very small .0151 for the simple
regression to .4882 for multiple regression.) Of course, ahead of time we cannot know which
effect will dominate, but we can certainly compare the standard errors after running both
(iv) The variable lstaff is the log of the number of staff per 1,000 students. As lstaff
increases, there are more teachers per student. We can associate this with smaller class sizes,
which are generally desirable from a teacher’s perspective. It appears that, all else equal, teachers
are willing to take less in salary to have smaller class sizes. The elasticity of salary with respect
to staff is about −.714, which seems quite large: a ten percent increase in staff size (holding
enrollment fixed) is associated with a 7.14 percent lower salary.
(v) When lunch is added to the regression, its coefficient is about −.00076, with t = −4.69.
Therefore, other factors fixed (bs, lenrol, and lstaff), a hire poverty rate is associated with lower
teacher salaries. In this data set, the average value of lunch is about 36.3 with standard deviation
of 25.4. Therefore, a one standard deviation increase in lunch is associated with a change in
View Full Document