Assignment 3 solutions - Econ 3210 Assignment 3 Fall 2015 Due Monday November 30th Late assignments will not be accepted as I will post the answers

Assignment 3 solutions - Econ 3210 Assignment 3 Fall 2015...

This preview shows page 1 out of 4 pages.

Unformatted text preview: Econ 3210: Assignment 3 Fall 2015 Due: Monday, November 30th. Late assignments will not be accepted, as I will post the answers once the due date expires. You may work and turn in assignments in groups. Names of group members and student IDs must be on the submitted work (I will not allow you to add names after the assignment is submitted). Groups can be no larger than four (4). If you hand in your assignment electronically (which I recommend), please note the following: 1. I will accept ONE (1) .pdf file for the entire assignment. 2. Please don’t name your assignment ‘Assignment 3.pdf.’ Instead, name it ‘Assignment 3 LastName.pdf’ or use first and last name if your last name is common. 3. Use the subject ‘Econ 3210’ in the subject line. 1. Download and work through the R template for Assignment 3. Once this is done, do the following: Based on the template that I’ve provided, change the code to download the dataset crime1.dta. This dataset is introduced on page 83 of the text in example 3.5. (a) Using R, estimate the first specification (without avgsen in example 3.5 and interpret your regression results. How does employment in 1986 affect arrests? Is it statistically significant at the 5% level on a two-sided test. lm.1 <- lm(narr86 ˜ pcnv + ptime86 + qemp86) (b) The data contains information on demographic characteristics. Add to the model the variables black, hispan, born60, which are dummy variables for Black and Hispanic ethnicity, and being born after 1960. Describe the results for each of these variables. Are they individually significant? (c) The R2 increases from the first to the second specification, which is not surprising. Does this indicate that the demographic variables are jointly statistically significant? Perform an F-test to investigate this. What is the restricted and unrestricted model in this case? (d) Add avgsen to each specification from (a) and (b) (ie., estimate two more regression models with this variable added). If longer sentences have a deterrent effect, what should the sign of the coefficient on avgsen be? How do you interpret this coefficient? Do you think that there is a deterrent effect? (e) Install the stargazer package by imputing the following code at the top of your script. install.packages(‘‘stargazer’’) require(‘‘stargazer’’) Output the results of your four specifications with the following code: stargazer(lm.1, lm.2, lm.3, lm.4, type=‘‘text’’) where lm.1 is the name of your first model, and so on. 1. R output Solutions: ==================================================== Dependent variable: --------------------------------------narr86 (1) (2) (3) (4) ---------------------------------------------------pcnv -0.150*** -0.127*** -0.151*** -0.128*** (0.041) (0.041) (0.041) (0.041) ptime86 -0.034*** -0.040*** -0.037*** -0.041*** (0.009) (0.009) (0.009) (0.009) qemp86 -0.104*** -0.095*** -0.103*** -0.095*** (0.010) (0.010) (0.010) (0.010) black 0.342*** (0.045) 0.338*** (0.045) hispan 0.203*** (0.040) 0.203*** (0.040) born60 -0.038 (0.033) -0.039 (0.033) avgsen Constant 0.007 (0.005) 0.712*** (0.033) 0.599*** (0.038) 0.004 (0.005) 0.707*** (0.033) 0.598*** (0.038) ---------------------------------------------------Observations 2,725 2,725 2,725 2,725 R2 0.041 0.066 0.042 0.066 ==================================================== 2. Higher employment is associated with a reduction of arrest. A one unit increase in the employment rate is associated with a reduction in arrests by 0.104. That is, being employed for one quarter reduces the number of arrests by 0.104 holding other things constant. It is highly significant. 3. Being black or hispanic is associated with more arrests. In particular, being black is associated with 0.34 more arrests, compared to the omitted group, holding other things constant. Being hispanic is associated with 0.2 more arrests. Both are statistically significant. Being born after 1960 is associated with 0.038 less arrests, but this is not significantly different than zero. 4. You can use the R2 to answer this. Or you can use R with the linearHypothesis command as follows. The p-value is very low, indicating that we can reject this hypothesis. Hypothesis: black = 0 hispan = 0 born60 = 0 Model 1: restricted model (iii) The intercept isis not very interesting as it gives qemp86 + nettfa for inchispanage = = Model 2: narr86 ˜ pcnv + ptime86 + the predicted black +inc = 0 and ageborn60 (iii) The intercept not very interesting as it gives the predicted nettfa for = 0 and + 0.0.Clearly, there isis no one with even close to these values in the relevant population. Clearly, there no one with even close to these values in the relevant population. Res.Df RSS Df Sum of Sq F Pr(>F) (iv) The t statistic isis (.843 − 1)/.092 ≈ −1.71.Against the one-sided alternative HH1β2 < < 1, 1(iv) The t statistic (.843 − 1)/.092 ≈ −1.71. Against the one-sided alternative 1: : β2 1, 2721 1927.3 the p-value isis about .044.Therefore,48.676 reject H0β2 5.35e-15 5% significance level we can reject H0: : 2 1 at the 5% significance level 2 2718 1878.6 3 the p-value about .044. Therefore, we can 23.475 β= = 1 at the *** (against the one-sided alternative). --(against the one-sided alternative). Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (v) The slope coefficient on inc inin the simple regression is about .821, which is not very (v) The slope coefficient on inc the simple regression is about .821, which is not very 5. avgsen the .799 obtained inin partcoefficient if longer sentences have a deterrent effect. We different fromshould have a negative(ii). As it it turns out, the correlation between inc and age different from the .799 obtained part (ii). As turns out, the correlation between inc and age inin dosample of singleis people is only about .039, which helps explain why the simple and the not find that peoplecase in about .039, which helps explain why the simple and the sample of single the is only either of the specifications. multiple regression estimates are not very different; refer back toto page 84 of the text. multiple regression estimates are not very different; refer back page 84 of the text. 2. Computer Exercise C9 on page 166 from Wooldridge. C4.9 (i) The results from the OLS regression, with standard errors inin parentheses, are C4.9 (i) The results from the OLS regression, with standard errors parentheses, are log( psoda ) = = −1.46 + +.073 prpblck + + .137 log(income) + + .380 prppov .073 prpblck .137 log(income) .380 prppov log( psoda ) −1.46 (0.29) (.031) (.027) (.133) (0.29) (.031) (.027) (.133) n n = 401,R2 = = .087 = 401, R2 .087 The p-value for testing HH0: 1 = = 0 against the two-sided alternative is about .018, so that we The p-value for testing 0: β β1 0 against the two-sided alternative is about .018, so that we reject HHatat the 5% level but not at the 1% level. reject 0 0 the 5% level but not at the 1% level. (ii) The correlation isis about −.84, indicating a strong degree of multicollinearity.Yet each (ii) The correlation about −.84, indicating a strong degree of multicollinearity. Yet each ˆ coefficient isis very statistically significant: the statistic for β log(income ) isis about 5.1 and that for coefficient very statistically significant: the t t statistic for ˆβ log(income ) about 5.1 and that for ˆˆ ββ isis about 2.86 (two-sided p-value = .004). about 2.86 (two-sided p-value = .004). prppov prppov (iii) The OLS regression results when log(hseval) isis added are (iii) The OLS regression results when log(hseval) added are log( psoda ) = = −.84 + + .098 prpblck − − .053 log(income) .098 prpblck .053 log(income) log( psoda ) −.84 (.29) (.029) (.038) (.29) (.029) (.038) + +.052 prppov + + .121 log(hseval) .052 prppov .121 log(hseval) (.134) (.018) (.134) (.018) n n = 401,R2 = = .184 = 401, R2 .184 The coefficient on log(hseval) isis an elasticity: a one percent increase in housing value, holding The coefficient on log(hseval) an elasticity: a one percent increase in housing value, holding the other variables fixed, increases the predicted price by about .12 percent. The two-sided pthe other variables fixed, increases the predicted price by about .12 percent. The two-sided pvalue isis zero to three decimal places. value zero to three decimal places. (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15% significance level against a a two-sided alternative for log(income), and prppov is does not 15% significance level against two-sided alternative for log(income), and prppov is does not have a a statistic even close toto one in absolute value).Nevertheless, they are jointly significant atat have t t statistic even close one in absolute value). Nevertheless, they are jointly significant the 5% level because the outcome of the F2,396 statistic isis about 3.52 with p-value = .030.All of the 5% level because the outcome of the F2,396 statistic about 3.52 with p-value = .030. All of 45 45 the control variables – log(income), prppov, and log(hseval) – are highly correlated, so it is not surprising that some are individually insignificant. (v) Because the regression in (iii) contains the most controls, log(hseval) is individually significant, and log(income) and prppov are jointly significant, (iii) seems the most reliable. It holds fixed three measure of income and affluence. Therefore, a reasonable estimate is that if the proportion of blacks increases by .10, psoda is estimated to increase by 1%, other factors held fixed. C4.10 (i) Using the 1,848 observations, the simple regression estimate of βbs is about −.795 . The 95% confidence interval runs from −1.088 to − .502 , which includes −1. Therefore, at the 5% level, we cannot reject that H 0 : βbs = −1 against the two-sided alternative. (ii) When lenrol and lstaff are added to the regression, the coefficient on bs becomes about −.605; it is now statistically different from one, as the 95% CI is from about −.818 to −.392. The situation is very similar to that in Table 4.1, where the simple regression estimate is −.825 and the multiple regression estimate (with the logs of enrollment and staff included) is −.605. (It is a coincidence that the two multiple regression estimates are the same, as the data set in Table 4.1 is for an earlier year at the high school level.) (iii) The standard error of the simple regression estimate is about .150, and that for the multiple regression estimate is about .109. When we add extra explanatory variables, two factors work in opposite directions on the standard errors. Multicollinearity – in this case, correlation between bs and the two variables lenrol and lstaff works to increase the multiple regression ˆ standard error. Working to reduce the standard error of βbs is the smaller error variance when lenrol and lstaff are included in the regression; in effect, they are taken out of the simple regression error term. In this particular example, the multicollinearity is modest compared with the reduction in the error variance. In fact, the standard error of the regression goes from .231 for simple regression to .168 in the multiple regression. (Another way to summarize the drop in the error variance is to note that the R-squared goes from a very small .0151 for the simple regression to .4882 for multiple regression.) Of course, ahead of time we cannot know which effect will dominate, but we can certainly compare the standard errors after running both regressions. (iv) The variable lstaff is the log of the number of staff per 1,000 students. As lstaff increases, there are more teachers per student. We can associate this with smaller class sizes, which are generally desirable from a teacher’s perspective. It appears that, all else equal, teachers are willing to take less in salary to have smaller class sizes. The elasticity of salary with respect to staff is about −.714, which seems quite large: a ten percent increase in staff size (holding enrollment fixed) is associated with a 7.14 percent lower salary. (v) When lunch is added to the regression, its coefficient is about −.00076, with t = −4.69. Therefore, other factors fixed (bs, lenrol, and lstaff), a hire poverty rate is associated with lower teacher salaries. In this data set, the average value of lunch is about 36.3 with standard deviation of 25.4. Therefore, a one standard deviation increase in lunch is associated with a change in 46 ...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture