Professor Tanya Molodtsova ECON 420 Office: Rich 319 Econometrics E-mail: Fall 2011 Problem Set 1- SOLUTION 1. Answer: a) cross-sectional data: data on 10 countries’ annual unemployment rate only in 2007 b) panel data: data on 10 countries’ annual unemployment rate from 1970 to 2007 c) time series data: data on unemployment rate GDP from 1970 to 2007 2. Answer: a) E(X) = 0* 0.4 + 1*0.2 + 2*0.2 + 3*0.1+4*0.1 = 1.3 b) var(X) = (0-1.3) 2 * 0.4 + (1-1.3) 2 *0.2 + (2-1.3) 2 *0.2 +(3-1.3) 2 *0.1 +(4-1.3) 2 *0.1 = 1.81 c) sd(x) = √var(X) = 1.345 3. Answer: Correlation coefficient of 1 indicates that the 2 variables are perfectly positively correlated. 4. Question C1.2 in Wooldridge (i) There are 1,388 observations in the sample. Sorting by the variable cigs in Excel shows that 212 women have cigs > 0. (ii) The average of cigs is about 2.09, but this includes the 1,176 women who did not smoke.

Unformatted text preview: Reporting just the average masks the fact that almost 85 percent of the women did not smoke. It makes more sense to say that the “typical” woman does not smoke during pregnancy; indeed, the median number of cigarettes smoked is zero. (iii) The average of cigs over the women with cigs > 0 is about 13.7. Of course this is much higher than the average over the entire sample because we are excluding 1,176 zeros. (iv) The average of fatheduc is about 13.2. There are 196 observations with a missing value for fatheduc , and those observations are necessarily excluded in computing the average. (v) The average and standard deviation of faminc are about 29.027 and 18.739, respectively, but faminc is measured in thousands of dollars. So, in dollars, the average and standard deviation are \$29,027 and \$18,739....
