E
[
E
[
Y

X
,
D
= 0]

D
= 1] is the outcome of a participant with the
same characteristics as a nonparticipant.
I
Note that the support of the covariates of the nonparticipants have
to contain that of the participants, if one wants to avoid parametric
extrapolations.
1
47
The Conditional Independence Assumption
I
One possibility to estimate this expression is to use a matching
estimator.
I
The general definition of a matching estimator is given by
ˆ
Y
0
i
=
X
j

D
j
=0
ω
ij
Y
j
,
I
The function
ω
ij
determines the number and the weights of the
control outcomes used to estimate
Y
0
for participant
i
.
I
There are several possibilities for determining
ω
ij
.
I
Nearestneighbour matching sets
ω
ij
= 1 for the nonparticipant
j
with characteristics most similar to those of participant
i
, and
ω
ij
0
= 0 for all other nonparticipants
j
0
6
=
j
.
I
k
nearestneighbour (
k
nn) matching uses the average outcomes of
the
k
most similar nonparticipants.
I
Caliper matching uses all controls (i.e., nonparticipants) for which
the observable characteristics do not differ more than some small
positive value.
1
47
The Conditional Independence Assumption
I
The difference of observed characteristics is judged with respect to
some metric, i.e., some measure of distance.
I
For onedimensional covariates, the absolute value of the difference
can be used, i.e.,

x
i

x
j

.
I
For more than one covariate, the Mahalanobis metric may be used,
which is defined as
d
(
x
1
,
x
2
)
=
p
(
x
1

x
2
)
0
S

1
(
x
1

x
2
)
,
where
x
1
and
x
2
are two vectors and
S
is a covariance matrix.
I
Another weighting method is kernel matching, which uses all
nonparticipants and determines the weights by a kernel function:
ω
ij
=
K
(
x
i

x
j
)
∑
‘

D
‘
=0
K
(
x
i

x
‘
)
.
I
The division by the sum of the kernel weights is necessary to obtain
a weighted average, as
∑
‘

D
‘
=0
K
(
x
i

x
‘
) does not necessarily sum
up to one.
1
47
The Conditional Independence Assumption
I
Consider now the propensity score
p
(
X
), which is defined as the
probability for choosing (or receiving) treatment conditional on
X
:
p
(
X
)
≡
Pr
(
D
= 1

X
)
=
E
[
D

X
]
.
I
It can be shown that the conditional independence assumption also
holds when
X
is replaced by
p
(
X
):
(
Y
1
,
Y
0
)
⊥⊥
D

X
⇒
(
Y
1
,
Y
0
)
⊥⊥
D

p
(
X
)
.
I
For matching and inverse probability weighting approaches it is
furthermore assumed that
0
<
p
(
X
)
<
1
,
which means that for each value of
X
there are participants as well
as nonparticipants.
I
All matching algorithms stated previously can also be based on
p
(
X
) instead of
X
.
I
Matching on the onedimensional propensity score may have better
finite sample properties than matching on the highdimensional
X
.
1
47
The Conditional Independence Assumption
I
A second method which is based on the conditional independence
assumption is the inverse probability weighting approach.
I
To see how this works, rewrite first
E
[
Y
1
] as follows:
E
[
Y
1
]
=
E
[
E
[
Y
1

X
]]
=
E
[
E
[
Y
1

X
,
D
= 1]]
=
E
[
E
[
DY

X
,
D
= 1]]
,
where the first equality follows by the law of iterated expectations,
the second by the conditional independence assumption, and the
third by
DY
=
D
(
DY
1
+ (1

D
)
Y
0
)
=
DY
1
,
as
D
(1

D
) =
D

D
2
=
D

D
= 0 (as
D
2
=
D
for a binary
D
).