This preview shows page 1. Sign up to view the full content.
Unformatted text preview: PADP 8120 Fertig Final Practice Problems Solution Spring 2011 UGA 1. Read the attached excerpt from a paper of mine. Provide a sentence or two interpreting every coefficient in column 6 of Table II. You may disregard the joint tests at the bottom of the table. You do not need to interpret the magnitudes focus on sign and significance. The coefficient on Smoked is 0.053. This indicates that smoking during pregnancy significantly reduces the birth weight for those births not born in 1970 or 2000, holding the controls constant. The coefficient on Smoked*1970 is 0.007 but is not significant. This indicates that the effect of smoking is no different for those born in 1970 than those born in the omitted year, holding the controls constant. The coefficient on Smoked*2000 is 0.019. This indicates that the effect of smoking on birth weight is worse for those born in 2000 than those born in the omitted year, holding the controls constant. The coefficient on 1970 is 0.008. This indicates that birth weights were significantly lower in 1970 than in the omitted year, holding the controls constant. The coefficient on 2000 is 0.014. This indicates that birth weights were significantly higher in 2000 than in the omitted year, holding the controls constant. The coefficient on Male is 0.037. This indicates that birth weights are significantly higher for boys than girls, holding the controls constant. The coefficient on Mother married at birth is 0.008. This indicates that birth weights are significantly higher for married mothers than for unmarried mothers, holding the controls constant. Finally, the constant term is 8.044, which is the average log birth weight for female births born in the omitted year to unmarried mothers who did not smoke during pregnancy (and all of the reference categories associated with the other controls). A log birth weight of 8.044 is about 3115 grams (e^8.044=3115). 2. Using bivariate OLS, estimate the effect of family income on the amount of debt families have. Your sample should include only those families who have any debt. . reg debtval fincome Source  SS df MS +Model  1.7972e+10 1 1.7972e+10 Residual  1.0727e+12 3449 311029411 +Total  1.0907e+12 3450 316148563 Number of obs F( 1, 3449) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 3451 57.78 0.0000 0.0165 0.0162 17636 debtval  Coef. Std. Err. t P>t [95% Conf. Interval] +fincome  .0587223 .0077251 7.60 0.000 .043576 .0738685 _cons  10855.63 569.548 19.06 0.000 9738.947 11972.32  a. Interpret the coefficient on family income in terms of sign, significance, and magnitude. The coefficient on family income is 0.0587223, which implies that a higher family income is significantly associated with the amount of debt families have, among families with any debt. If family income is $10,000 higher, their debt is $587.22 higher on average. b. Display a graph of the data points (yaxis is debt value and xaxis is family income) and the linear relationship. 100000 0 0 20000 40000 60000 80000 50000 100000 150000 TOTAL FAMILY INCOME2006 Fitted values 200000 VDEBT07 (2007$) 3. Using OLS, estimate the effect of family income on the amount of savings with and without a control for retired head. Your sample should include only those families who have any savings. . reg savingsval fincome Source  SS df MS +Model  7.6480e+10 1 7.6480e+10 Residual  3.4388e+12 4683 734322500 +Total  3.5153e+12 4684 750493573 Number of obs F( 1, 4683) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 4685 104.15 0.0000 0.0218 0.0215 27098 savingsval  Coef. Std. Err. t P>t [95% Conf. Interval] +fincome  .1021531 .0100097 10.21 0.000 .0825293 .1217768 _cons  6274.907 747.4363 8.40 0.000 4809.58 7740.233 . reg savingsval fincome retiredh Source  SS df MS +Model  2.7183e+11 2 1.3592e+11 Residual  3.2426e+12 4678 693161799 +Total  3.5144e+12 4680 750948986 Number of obs F( 2, 4678) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 4681 196.08 0.0000 0.0773 0.0770 26328 savingsval  Coef. Std. Err. t P>t [95% Conf. Interval] +fincome  .1429776 .0100285 14.26 0.000 .1233171 .1626382 retiredh  19868.99 1183.233 16.79 0.000 17549.29 22188.68 _cons  1123.738 788.8345 1.42 0.154 422.7497 2670.225  In both models, a higher family income is significantly associated with a higher savings amount among savers. When retirement status is not accounted for, if family income increases by $10,000, savings is $1,022 higher on average. When retirement status is included as a control, if family income increases by $10,000, savings is $1,430 higher on average. b. Discuss how the coefficient changes with the addition of a control and offer an explanation. The effect of family income on savings is higher when retirement status is included as a control. The coefficient on retired is significant and positive indicating that retirees have almost $20,000 more in savings on average than working families with the same amount of family income. Because retirees tend to have low incomes because they are not working, but high savings from a lifetime of saving so that they can retire, when the two groups are combined, the effect of income a. Interpret the coefficient on family income from both models in terms of sign, significance, and magnitude. on savings is flattened out. When the two groups are allowed to have separate intercept terms, we see that the effect of income on savings is much higher. c. Display a graph comparing the relationship between savings amount and family income by retirement status. Clearly mark on the graph which line goes with the retirees and which goes with the not retired. 50000 0 10000 Fitted values 20000 30000 Retirees Working families 40000 0 50000 100000 150000 TOTAL FAMILY INCOME2006 Fitted values Fitted values 200000 4. Examine the relationship between family income and age of the head graphically. Does the relationship appear to be linear? If yes, specify the regression equation and estimate the parameters. If no, determine how best to model the nonlinearity and demonstrate that this model has a better fit than one that assumes a linear relationship. 200000 0 20 50000 100000 150000 40 60 AGE OF HEAD 80 lowess fincome ageh 100 The relationship is clearly nonlinear. The options are to use a polynomial of age or a log transformation of age. Here I show both options and it is very clear from the rsquareds that the polynomial specification is a much better fit. Source  SS df MS +Model  1.1594e+12 2 5.7972e+11 Residual  7.9047e+12 5944 1.3299e+09 +Total  9.0641e+12 5946 1.5244e+09 Number of obs F( 2, 5944) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 5947 435.93 0.0000 0.1279 0.1276 36467 TOTAL FAMILY INCOME2006 fincome  Coef. Std. Err. t P>t [95% Conf. Interval] +ageh  4435.016 151.8944 29.20 0.000 4137.248 4732.784 agesq  44.15439 1.496738 29.50 0.000 47.08854 41.22023 _cons  40964.13 3540.738 11.57 0.000 47905.26 34023 Source  SS df MS +Model  7.0875e+10 1 7.0875e+10 Residual  8.9933e+12 5945 1.5127e+09 +Total  9.0641e+12 5946 1.5244e+09 Number of obs F( 1, 5945) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 5947 46.85 0.0000 0.0078 0.0077 38894 fincome  Coef. Std. Err. t P>t [95% Conf. Interval] +lageh  9396.3 1372.755 6.84 0.000 6705.201 12087.4 _cons  22073.02 5131.959 4.30 0.000 12012.52 32133.52  5. Make a table comparing the means of retirees to workers including the following variables: head's age, wife's age, family income, savings amount, debt amount, probability of owning any stocks, probability of owning a home, head's education, and probability of being in fair or poor health. Indicate which means are significantly different across groups at the .1% level, 1% level, 5% level and 10% level. Retirees (n=751) Workers (n=5194) Age of head 72.45 40.05*** Age of wife 65.16 39.60*** Family Income 35278.66 60199.87*** Savings Amount 26487.86 10705.77*** Debt Amount 8073.70 15006.31*** Any stock holdings 21.57% 13.95%*** Own home 77.84% 58.93%*** Years of education of head 11.85 13.18*** Head's health is fair or poor 39.33% 9.09%*** + significant at the 10% level, * 5% level, ** 1% level, *** 0.1% level. 6. Say you ran a logistic regression that reported an odds ratio of 4, where a mental health diagnosis is the dependent variable and having any debt is the independent variable. Interpret the odds ratio in words. The odds of having a mental health diagnosis vs. not having a diagnosis are 4 times greater for people with any debt than for people without debt. clear all cap log close set mem 800M set maxvar 30000 set more 1 cd "\Users\afertig\Documents\Teaching\PADP8120\finalexam\" log using finalpracticesoln.log, replace use finalexam.dta ***practice problems ******************** ***problem 2 reg debtval fincome twoway scatter debtval fincome  lfit debtval fincome ***problem 3 reg savingsval fincome reg savingsval fincome retiredh twoway lfit savingsval fincome if retiredh==0  lfit savingsval fincome if retiredh==1 ***problem 4 twoway scatter fincome ageh  lowess fincome ageh gen lageh=log(ageh) gen agesq=ageh*ageh reg fincome ageh reg fincome ageh agesq reg fincome lageh ***problem 5 gen fpoor=(hstath==4  hstath==5) replace fpoor=. if hstath==. sum ageh agew fincome savingsval debtval stockyn own educh fpoor if retiredh==1 sum ageh agew fincome savingsval debtval stockyn own educh fpoor if retiredh==0 ttest ageh, by(retiredh) unequal ttest agew, by(retiredh) unequal ttest fincome, by(retiredh) unequal ttest savingsval, by(retiredh) unequal ttest debtval, by(retiredh) unequal prtest stockyn, by(retiredh) prtest own, by(retiredh) tab educh retiredh, chi2 prtest fpoor, by(retiredh) log close ...
View Full
Document
 Summer '11
 FERTIG

Click to edit the document details