Eco 572: Research methods in Demography
Rates and Standardization (Revised)
We will work through the example in Preston et al, sections 2.2 and 2.3. (This revised version of the handout
deemphasizes programming Stata and focuses on canned procedures.)
Sample Data
I copied the counts of midyear population and deaths by Age for Sweden and Kazakhstan from Table 2.1 into a
text file which is available in the course website. The file is in "long" format and can be read into Stata using
. infile str10 country str5 ageg pop deaths ///
>
using http://data.princeton.edu/eco572/datasets/preston21long.dat
(38 observations read)
The first thing we do is calculate the agespecific rates, diviing deaths by population and multiplying by 1000:
. gen rates = 1000 * deaths / pop
Crude death rates are just a weighted average of agespecific rates using the population in each age group as
the weight. We can easily compute them in Stata using the
tabstat
command:
. tabstat rates [fw=pop], by(country)
Summary for variables: rates
by categories of: country
country 
mean
+
Kazakhstan 
7.423042
Sweden 
10.54756
+
Total 
8.470285

The interesting result here is that mortality appears to be
lower
in Kazakhstan than in Sweden.
Standardized Rates
Following Preston et al., we will standardize the rates using the unweighted average of the two population
compositions as the standard. To do this we first compute the percent distribution for each country using
egen
,
and then compute the average percent in each age:
. egen pcpop = pc(pop), by(country)
. egen avgcomp = mean(pcpop), by(ageg)
You may want to list the data to verify that
avg
has the same values for the two countries. Now we can
compute the standardized rate in one line:
. tabstat rates [aw=avgcomp], by(country)
http://data.princeton.edu/eco572/std.html (1 of 7) [2/11/2008 2:25:59 PM]
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Eco 572: Research methods in Demography
Summary for variables: rates
by categories of: country
country 
mean
+
Kazakhstan 
11.882
Sweden 
7.374094
+
Total 
9.628045

The only difference is that I specified
aw
, an "analytic" weight, instead of
fw
a "frequency" weight, so Stata
wouldn't complaint about noninteger weights. Both compute means the same way: multiply each observation
by the weight, sum, and divide by the sum of the weights.
Indirect Standardization
Frequently we don't have agespecific rates but can easily obtain the age distribution. We can still do a form of
standardization applying the rates of one country (or any other standard) to the two age distributions.
Let us first create a variable that has the Swedish rates for both countries. We do this sorting by age and then
country, and for each age we pick the rate of the second country:
. bysort ageg (country): gen swrates = rates[2]
The
by
command is a very powerful feature of Stata that can repeat a command for subgroups. The data must
be sorted, but if you specify
bysort
Stata will take care of that. In this case the computation is done by
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '06
 Rodriguez
 Econometrics, Demography, Regression Analysis, Sweden

Click to edit the document details