Imbens/Wooldridge, Lecture Notes 8, Summer ’07
What
’
s New in Econometrics
?
NBER
,
Summer 2007
Lecture 8
,
Tuesday
,
July 31st
,
2
.
00

3
.
00 pm
Cluster and Stratified Sampling
These notes consider estimation and inference with cluster samples and samples obtained
by stratifying the population. The main focus is on true cluster samples, although the case of
applying clustersample methods to panel data is treated, including recent work where the sizes
of the cross section and time series are similar. Wooldridge (2003, extended version 2006)
contains a survey, but some recent work is discussed here.
1
.
THE LINEAR MODEL WITH CLUSTER EFFECTS
This section considers linear models estimated using cluster samples (of which a panel data
set is a special case). For each group or cluster
g
,let
y
gm
,
x
g
,
z
gm
:
m
1,.
..,
M
g
be the
observable data, where
M
g
is the number of units in cluster
g
,
y
gm
is a scalar response,
x
g
is a
1
K
vector containing explanatory variables that vary only at the group level, and
z
gm
is a
1
L
vector of covariates that vary within (as well as across) groups.
1
.
1 Specification of the Model
The linear model with an additive error is
y
gm
x
g
z
gm
v
gm
,
m
1,.
..,
M
g
;
g
1,.
..,
G
.
(1.1)
Our approach to estimation and inference in equation (1.1) depends on several factors,
including whether we are interested in the effects of aggregate variables
or
individualspecific variables
. Plus, we need to make assumptions about the error terms. In
the context of pure cluster sampling, an important issue is whether the
v
gm
contain a common
group effect that can be separated in an additive fashion, as in
v
gm
c
g
u
gm
,
m
1,.
..,
M
g
,
(1.2)
where
c
g
is an unobserved cluster effect and
u
gm
is the idiosyncratic error. (In the statistics
literature, (1.1) and (1.2) are referred to as a “hierarchical linear model.”) One important issue
is whether the explanatory variables in (1.1) can be taken to be appropriately exogenous.
Under (1.2), exogeneity issues are usefully broken down by separately considering
c
g
and
u
gm
.
Throughout we assume that the sampling scheme generates observations that are
independent across
g
. This assumption can be restrictive, particularly when the clusters are
large geographical units. We do not consider problems of “spatial correlation” across clusters,
although, as we will see, fixed effects estimators have advantages in such settings.
We treat two kinds of sampling schemes. The simplest case also allows the most flexibility
1