Stata Walkthrough 4: Regression, Prediction, and Forecasting
Over drinks the other evening, my neighbor told me about his 25

year

old nephew,
who is dating a 35

year

old woman.
“
God, I can
’
t see them getting married,
”
he said.
I raised my eyebrow, because as an economic demographer, I know that spouses
’
ages are very predictable: they tend to be similar, with the husband just a couple of
years older than his wife. Strange cases do occur, but they tend to involve older men
and younger women. The opposite is fairly unlikely.
However, I didn
’
t know exactly
what would constitute a range of
“
likely outcomes
”
for the age of the woman that a
25

year

old guy would marry.
In this Stata exercise, we
’
re going to do two things:
1.
We will use data on married couples ages
’
to estimate the relationship
between spouses
’
ages.
2.
We will use this to predict a range of
“
likely outcomes
”
for the age of the
wife of a twenty

five

year

old man.
You
’
ll need to load the database of
“
U.S. married couples, March 2005
”
(
marriedmar05.dta
)
from my webpage. These data come from the Current
Population Survey
(
done by the Bureau of Labor Statistics
)
, and they are a random
sample of all households in the U.S.
I have restricted the sample to only couples that
identify themselves as
“
married.
”
We have 34,674 of these couples.
In general, you should begin any project by exploring the data. In this case, that
’
s
simple, since this database contains two variables:
hage
and
wage
, the husband
’
s age
and the wife
’
s age.
First, let
’
s look at some descriptive statistics: type
summarize
.
Variable 
Obs
Mean
Std. Dev.
Min
Max
+
wage 
34674
45.78664
13.26782
15
79
hage 
34674
48.07369
13.62849
15
79
Values of each variable range from 15 to 79. The mean age of a married man is 48.1
years, and the mean age of a married woman is 45.8 years. We can see that there
’
s a
strong correlation between the two variables by typing
corr hage wage
:
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document

wage
hage
+
wage 
1.0000
hage 
0.9243
1.0000
In other words, older men tend to be married to older women
—
this should be no
surprise.
However, the sample means reveal that married men are older than married
women on average, so we can also infer that typically each husband must be older
than his wife.
We might be interested in the distribution of the age difference, so let
’
s create a new
variable for the difference in ages:
gen dage = hage – wage
label variable dage "Difference between spouses' ages"
Now let
’
s look at the frequency distribution of this variable by typing
histogram dage
The image is a bit ugly. First, the graph goes from

50 to +50, even though there
’
s
basically nobody out in those tails.
(
There are some people, but they
’
re not
significant enough to register in the graph.
)
For all practical purposes, the range goes
from

20 to +30, so we
’
ll restrict it to that. Second, Stata is using a default bin

width
of 2.44 units. There
’
s a natural width in this case, I
’
d think: one, a single year. Let
’
s
tell Stata to make each bin of width one. Third, there aren
’
t nearly enough tick
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 turchi
 Normal Distribution, Wife, lowerfore hage

Click to edit the document details