notes smoothing -running means and lines

notes smoothing -running means and lines - Eco 572:...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Eco 572: Research methods in Demography Smoothing: Running Means and Lines We will work with data from the Colombia WFS Household Survey, conducted in 1975-76. I tabulated the age distribution of all household members and saved it in an ascci file, which we now read and plot: . infile age pop using /// > http://data.princeton.edu/eco572/datasets/cohhpop.dat (99 observations read) . line pop age , /// > title(Colombia 1975-76) subtitle(WFS Household Survey) /// > ytitle(population) . graph export cohhpop.png, replace (file cohhpop.png written in PNG format) As you can see, the distribution looks somewhat less smooth than the data from the Philippines we studied earlier. Can you compute the Myers index for this distribution? Running Means The simplest way to smooth a scatterplot is to use moving averages, also know as running means. We (1 of 7) [2/12/2008 10:25:58 AM] http://data.princeton.edu/eco572/smoothing1.html Eco 572: Research methods in Demography can easily compute a moving average with a window of three, where for each observation we consider the neighbors immediately above and below: . gen ma3 = ( pop[_n-1] + pop[_n] + pop[_n+1] )/3 (2 missing values generated) . replace ma3 = (pop[1]+pop[2])/2 in 1 (1 real change made) . replace ma3 = (pop[_N-1] + pop[_N])/2 in -1 (1 real change made) The first and last observations have only one neighbor each, and Stata rightly returns a missing value, which we replace by the average of just two. There is, of course, an easier way. The lowess command can compute running means if you specify the options mean and noweight . The window is specified as a proportion of the data via an option called bwidth , which is short for bandwidth. To reproduce our result we need to use 0.03 or 3% of the data: . lowess pop age, mean noweight bwidth(0.03) /// > gen(rm3) name(rm3) title(" ") First let us list a couple of cases to verify that lowess and us are doing the same thing: . list age ma3 rm3 in 1/4 +-------------------+ | age ma3 rm3 | |-------------------| 1. | 0 1529 1529 | 2. | 1 1473 1473 | 3. | 2 1447 1447 | 4. | 3 1462 1462 | +-------------------+ Looking at the plot (below) we see that a moving average of just three observations is too "wiggly"; it...
View Full Document

Page1 / 7

notes smoothing -running means and lines - Eco 572:...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online