After your report from Problem Set 2, the Minister of Agriculture from Nicaragua was convinced that the data you previously had access to were insufficient to generate an unbiased estimate of the causal impact of the Rural Business Development (RBD) Program. Recall that the RBD provided training and credit to farmers in order to raise farm productivity and household income. The government has extended your contract for one more week in order to generate a new and improved estimate of the ATT before deciding whether to scale up the RBD.

To assist you, the Minister sent out a team to re-survey the same 1,684 households for which you had data on income and program participation in Problem Set 2. The Minister instructed the team to ask each of the households about the 2014 values of 5 key characteristics that the Minister felt (based on your previous report!) might have influenced their decision to participate in the program. Recall that the program was implemented in 2015, so the values of these 5 additional variables are pre-program values. This data set is available on the course website under the name "ps3_Nicaragua.dta". The first two variables -- *treat *and *income *-- are the same variables you used in Problem Set 2. The full data set contains these two variables plus 5 additional pre-program variables as follows:

**Variable Name **

**Description **

*treat *

1 = HH participated in RBD program in 2015; 0 = HH did NOT participate in RBD in 2015

*income *

2015 per capita income from main program activity ($ US)

*job *

1 = HH main activity was cattle in 2014; 2 = HH main activity was grain in 2014; 3 = HH main activity was yuca (cassava root) in 2015

*age *

Age of head of household in 2014

*education *

Number of years of education obtained by head of household in 2014

*capital *

Total value of mobile capital (tools, tractors, equipment, etc.) used on the farm in 2014

*land *

Farm size (in manzanas; 1 manzana = 1.7 acres) in 2014

1 Please format all answers nicely in a word processing program. Attach your do-files (including the names of all who participated in writing the do-file) to the end of your document.

1. In the last problem set, we tried to estimate the ATT of the RBD program using a simple bivariate regression. However, we know that omitting variables can bias our estimates. Let's start by exploring this potential bias in more detail. Specifically, let's look at the implication of omitting the household's farm size (the variable "land" in your data set).

- Why might we be concerned about omitting
*land*? In your answer, please refer to the two conditions we discussed in lecture that need to hold in order for an omitted variable to be a problem (i.e., lead to Omitted Variable Bias). - Let's begin by reminding ourselves of the results from the bivariate regression:
- ???????????????????????? =????+???? ∗???????????????????? +???? ????1????????
- ̂
- What is ????1, your regression coefficient from this bivariate regression? How do
- we interpret this coefficient?
- Now estimate the "long" regression with both
*TREAT*and the households' pre-

program farm-size (*land*) on the right hand side:

???????????????????????? =????+???????? ∗???????????????????? + ???????? ∗???????????????? +????????

????1????2???????? ̂???? ̂????

How do you interpret ????1 and ????2 ? Discuss both economic and statistical

significance.

d. How strongly is baseline farm size related to whether farmers participate in to the

program? To answer this question, estimate the following regression:

???????????????? =???? +???? ∗????????????????????+????

????01 ????????

How do you interpret ????̂1?

- Use your results from parts (c) and (d) along with the OVB formula from class to
- calculate the magnitude of the omitted variable bias.
- Interpret (in a short paragraph) the OVB. Should we be concerned if we fail to
- control for farm size? Why or why not?
- Farm size (
*land*) is not the only potential omitted variable. Discuss, using economic theory, whether or not you would be concerned if we fail to control for each of the other 4 baseline variables included in the data set. In your answer, make sure you refer to the two conditions required for an omitted variable to cause bias in our estimate of program impact. - A common way to examine whether the treatment group differs systematically from the control group is to construct a balance table.
- Calculate the mean of each variable for both treatment and control groups (Note that
*job*is a categorical variable with 3 possible values, so calculate the proportion of farmers with each value - i.e., in each of the three primary activities -- for both groups.) - Conduct a t-test for the difference in means between the two groups for each variable. Are any of the differences statistically significant?
- 2

c. Should any of these variables be included in a regression? Discuss why or why not, with reference to your findings in the table.

d. Looking back to question 2, did you find what you expected for the systematic differences across groups?

- Let us now examine how controlling for these covariates affects our estimates of the treatment effect. In a single table (using the "outreg" commands you learned in section), report the parameter estimates from two models: a bivariate regression including only the treatment variable on the RHS and a multivariate regression including all the baseline controls in addition to the treatment variable.
- Assuming that we have controlled for all the relevant variables, what is your estimate of the ATT using multivariate regression? Is it economically significant? Is it statistically significant?
- Compare your estimates of the ATT from the two regressions. How important was controlling for these variables?
- Rather than controlling for covariates in multivariate regression, we can also match observations between treatment groups based on their covariates.
- Using a probit model, calculate the probability that each individual participates in the RBD program. Which variables seem important in predicting participation? Do the signs of the coefficients make sense? (Make sure you save the predicted propensity scores from this estimation!)
- On a single graph, plot the kernel densities of the propensity scores for both the treatment and control groups. Do we have a common support? Do you have any concerns?
- Using Stata's command
*teffects psmatch*, estimate the ATT using propensity score matching (be sure to specify a probit model and the option*atet*to get ATT estimates). Compare your estimate of the ATT to that obtained with multiple regression. - Interpreting the estimates from both multiple regression and propensity score matching as the causal impact of the RBD program requires the same key assumption.
- What is the name of this assumption? What does it mean, "in English"?
- How confident are you that this assumption holds with this augmented data set
- provided to you by the Minister of Agriculture? How confident are you that you that this new data set has allowed you to generate an unbiased estimate of the ATT?

### Recently Asked Questions

- Please refer to the attachment to answer this question. This question was created from Problem+Set+6. Additional comments: "How do you do these problems?"

- Please refer to the attachment to answer this question. This question was created from ch14rev.pdf. Additional comments: "Wouldn't it be reliable but not

- Please refer to the attachment to answer this question. This question was created from Chap014.