E[E[Y|X,D= 0]|D= 1] is the outcome of a participant with thesame characteristics as a nonparticipant.INote that the support of the covariates of the nonparticipants haveto contain that of the participants, if one wants to avoid parametricextrapolations.147
The Conditional Independence AssumptionIOne possibility to estimate this expression is to use a matchingestimator.IThe general definition of a matching estimator is given byˆY0i=Xj|Dj=0ωijYj,IThe functionωijdetermines the number and the weights of thecontrol outcomes used to estimateY0for participanti.IThere are several possibilities for determiningωij.INearest-neighbour matching setsωij= 1 for the nonparticipantjwith characteristics most similar to those of participanti, andωij0= 0 for all other nonparticipantsj06=j.Ik-nearest-neighbour (k-nn) matching uses the average outcomes ofthekmost similar nonparticipants.ICaliper matching uses all controls (i.e., nonparticipants) for whichthe observable characteristics do not differ more than some smallpositive value.147
The Conditional Independence AssumptionIThe difference of observed characteristics is judged with respect tosome metric, i.e., some measure of distance.IFor one-dimensional covariates, the absolute value of the differencecan be used, i.e.,|xi-xj|.IFor more than one covariate, the Mahalanobis metric may be used,which is defined asd(x1,x2)=p(x1-x2)0S-1(x1-x2),wherex1andx2are two vectors andSis a covariance matrix.IAnother weighting method is kernel matching, which uses allnonparticipants and determines the weights by a kernel function:ωij=K(xi-xj)∑‘|D‘=0K(xi-x‘).IThe division by the sum of the kernel weights is necessary to obtaina weighted average, as∑‘|D‘=0K(xi-x‘) does not necessarily sumup to one.147
The Conditional Independence AssumptionIConsider now the propensity scorep(X), which is defined as theprobability for choosing (or receiving) treatment conditional onX:p(X)≡Pr(D= 1|X)=E[D|X].IIt can be shown that the conditional independence assumption alsoholds whenXis replaced byp(X):(Y1,Y0)⊥⊥D|X⇒(Y1,Y0)⊥⊥D|p(X).IFor matching and inverse probability weighting approaches it isfurthermore assumed that0<p(X)<1,which means that for each value ofXthere are participants as wellas nonparticipants.IAll matching algorithms stated previously can also be based onp(X) instead ofX.IMatching on the one-dimensional propensity score may have betterfinite sample properties than matching on the high-dimensionalX.147
The Conditional Independence AssumptionIA second method which is based on the conditional independenceassumption is the inverse probability weighting approach.ITo see how this works, rewrite firstE[Y1] as follows:E[Y1]=E[E[Y1|X]]=E[E[Y1|X,D= 1]]=E[E[DY|X,D= 1]],where the first equality follows by the law of iterated expectations,the second by the conditional independence assumption, and thethird byDY=D(DY1+ (1-D)Y0)=DY1,asD(1-D) =D-D2=D-D= 0 (asD2=Dfor a binaryD).