E E Y X D 0 D 1 is the outcome of a participant with the same characteristics

# E e y x d 0 d 1 is the outcome of a participant with

• 77

This preview shows page 60 - 65 out of 77 pages.

E [ E [ Y | X , D = 0] | D = 1] is the outcome of a participant with the same characteristics as a nonparticipant. I Note that the support of the covariates of the nonparticipants have to contain that of the participants, if one wants to avoid parametric extrapolations. 1 47
The Conditional Independence Assumption I One possibility to estimate this expression is to use a matching estimator. I The general definition of a matching estimator is given by ˆ Y 0 i = X j | D j =0 ω ij Y j , I The function ω ij determines the number and the weights of the control outcomes used to estimate Y 0 for participant i . I There are several possibilities for determining ω ij . I Nearest-neighbour matching sets ω ij = 1 for the nonparticipant j with characteristics most similar to those of participant i , and ω ij 0 = 0 for all other nonparticipants j 0 6 = j . I k -nearest-neighbour ( k -nn) matching uses the average outcomes of the k most similar nonparticipants. I Caliper matching uses all controls (i.e., nonparticipants) for which the observable characteristics do not differ more than some small positive value. 1 47
The Conditional Independence Assumption I The difference of observed characteristics is judged with respect to some metric, i.e., some measure of distance. I For one-dimensional covariates, the absolute value of the difference can be used, i.e., | x i - x j | . I For more than one covariate, the Mahalanobis metric may be used, which is defined as d ( x 1 , x 2 ) = p ( x 1 - x 2 ) 0 S - 1 ( x 1 - x 2 ) , where x 1 and x 2 are two vectors and S is a covariance matrix. I Another weighting method is kernel matching, which uses all nonparticipants and determines the weights by a kernel function: ω ij = K ( x i - x j ) | D =0 K ( x i - x ) . I The division by the sum of the kernel weights is necessary to obtain a weighted average, as | D =0 K ( x i - x ) does not necessarily sum up to one. 1 47
The Conditional Independence Assumption I Consider now the propensity score p ( X ), which is defined as the probability for choosing (or receiving) treatment conditional on X : p ( X ) Pr ( D = 1 | X ) = E [ D | X ] . I It can be shown that the conditional independence assumption also holds when X is replaced by p ( X ): ( Y 1 , Y 0 ) ⊥⊥ D | X ( Y 1 , Y 0 ) ⊥⊥ D | p ( X ) . I For matching and inverse probability weighting approaches it is furthermore assumed that 0 < p ( X ) < 1 , which means that for each value of X there are participants as well as nonparticipants. I All matching algorithms stated previously can also be based on p ( X ) instead of X . I Matching on the one-dimensional propensity score may have better finite sample properties than matching on the high-dimensional X . 1 47
The Conditional Independence Assumption I A second method which is based on the conditional independence assumption is the inverse probability weighting approach. I To see how this works, rewrite first E [ Y 1 ] as follows: E [ Y 1 ] = E [ E [ Y 1 | X ]] = E [ E [ Y 1 | X , D = 1]] = E [ E [ DY | X , D = 1]] , where the first equality follows by the law of iterated expectations, the second by the conditional independence assumption, and the third by DY = D ( DY 1 + (1 - D ) Y 0 ) = DY 1 , as D (1 - D ) = D - D 2 = D - D = 0 (as D 2 = D for a binary D ).