This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Eco 572: Research methods in Demography Smoothing: Running Means and Lines We will work with data from the Colombia WFS Household Survey, conducted in 1975-76. I tabulated the age distribution of all household members and saved it in an ascci file, which we now read and plot: . infile age pop using /// > http://data.princeton.edu/eco572/datasets/cohhpop.dat (99 observations read) . line pop age , /// > title(Colombia 1975-76) subtitle(WFS Household Survey) /// > ytitle(population) . graph export cohhpop.png, replace (file cohhpop.png written in PNG format) As you can see, the distribution looks somewhat less smooth than the data from the Philippines we studied earlier. Can you compute the Myers index for this distribution? Running Means The simplest way to smooth a scatterplot is to use moving averages, also know as running means. We (1 of 7) [2/12/2008 10:25:58 AM] http://data.princeton.edu/eco572/smoothing1.html Eco 572: Research methods in Demography can easily compute a moving average with a window of three, where for each observation we consider the neighbors immediately above and below: . gen ma3 = ( pop[_n-1] + pop[_n] + pop[_n+1] )/3 (2 missing values generated) . replace ma3 = (pop+pop)/2 in 1 (1 real change made) . replace ma3 = (pop[_N-1] + pop[_N])/2 in -1 (1 real change made) The first and last observations have only one neighbor each, and Stata rightly returns a missing value, which we replace by the average of just two. There is, of course, an easier way. The lowess command can compute running means if you specify the options mean and noweight . The window is specified as a proportion of the data via an option called bwidth , which is short for bandwidth. To reproduce our result we need to use 0.03 or 3% of the data: . lowess pop age, mean noweight bwidth(0.03) /// > gen(rm3) name(rm3) title(" ") First let us list a couple of cases to verify that lowess and us are doing the same thing: . list age ma3 rm3 in 1/4 +-------------------+ | age ma3 rm3 | |-------------------| 1. | 0 1529 1529 | 2. | 1 1473 1473 | 3. | 2 1447 1447 | 4. | 3 1462 1462 | +-------------------+ Looking at the plot (below) we see that a moving average of just three observations is too "wiggly"; it...
View Full Document
- Spring '06