Homework # 4 ECO 7427 ANSWER KEY Prof. Sarah Hamersma

1 Homework # 4 ECO 7427 ANSWER KEY Prof. Sarah Hamersma 1. Use DataFerrett to view the variables available from the 1990 U.S. Census …. (a) Get the data into Stata format and report the summary statistics for the variables you downloaded (this is all you should submit for part (a)). Here are my summary statistics: . summarize; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- AGE | 1307885 35.051 22.46781 0 90 HOUR89 | 1307885 20.72031 21.61533 0 99 INCOME1 | 1307885 10442.26 18867.82 0 197927 Please do (b) - (d) below for each of the following three variables: (1) age (2) hours usually worked per week last year (3) wage and salary income last year. For the last two, leave out people who did not work and/or did not earn any income last year (you do not need to drop them from the data just restrict your commands to positive values using an “if” statement at the end of each command - see Stata help for details). (b) Generate a his togram of variable X, using Stata’s default (i.e. do not indicate a particular bin size or width). Does this do a reasonable job of displaying the distribution? If there seem to be strange or interesting patterns, describe them and indicate any reason you think they may be there. AGE: 0 .005 .01 .015 .02 .025 Density 0 20 40 60 80 100 Age The age histogram has a very strange pattern of (almost) alternating high and low bars. The reason is that the default bins are not integers, even though the variable is coded as an integer. The rounding issue causes some bins to include two age categories and others to include only one, resulting in the choppy pattern. Other than that, it appears that much of the sample is between the ages of 0 and 50 and then the density begins to decrease. Note that age appears to be top- coded at 90 (there’s a bump there).

2 HOUR89: 0 .1 .2 .3 Density 0 20 40 60 80 100 Usual Hrs. Worked Per Week Last Yr. 1989 INCOME1: 1.0e-05 2.0e-05 3.0e-05 4.0e-05 0 50000 100000 150000 200000 Wages or Salary Inc. in 1989 (c) Report the mean and standard deviation of X (again, not including the zeros if there are any). Do these do a good job of summarizing the distribution of the data? . sum AGE; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- AGE | 1307885 35.051 22.46781 0 90 . sum HOUR89 if HOUR89 != 0; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- HOUR89 | 695138 38.98475 12.9241 1 99 . sum INCOME1 if INCOME1 != 0; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- INCOME1 | 648705 21053.14 22233.89 1 197927 The hours histogram has a clear mode at 40 hours/week. It is also noteworthy that there are small spikes at “round” numbers, such as 20, 30, 50, and 60 hours. There are also smaller spikes at 25, 35, 45, 55. Very few people report their hours in a number not divisible by 5! The income histogram suggests a
