Where studies reported more than one effect size based on different statistical methods we selected the effect size with the lowest risk of bias. We used this methodology for a study in India in which the
authors used both propensity score matching and instrumental variable regression analysis to determine the impact of the program (De Hoop et al., 2014). A priori it was not clear which method had the lowest risk of bias. However, the effect size calculation clarified that the instrumental variable regression method did not result in valid effect sizes because predicted empowerment values fell outside the bandwidth of values from 0–1 for dichotomous variables. Although the impact estimates from the instrumental variable regression analysis study might have presented qualitatively interesting
findings, the instrumental variable linear probability model did not showunbiased impact estimates. Hence, the risk of bias of the effect size was high. Therefore, we chose to use the impact estimates from the propensity score matching model for this study because we considered these impact estimates as medium risk of bias.Other studies presented several impact estimates for different variables that could be argued to measure the same construct. In those cases, we chose to use either the variable that we considered the best approximation of the construct or a sample-size weighted average to measure a “synthetic effect size.” For example, in the study of Kim et al. (2009), we constructed a sample-size weighted average by estimating the average impact on self-confidence and financial confidence for psychological empowerment and on the challenging of gender norms and autonomy in decision making for social empowerment. In these cases, we used the average values of the standard errors (without weighing for the sample size) to estimate the pooled standard deviation. Similarly, for the study by De Hoop et al. (2014), we chose to calculate a sample-size weighted average for social empowerment by averaging the effects on the women’s autonomy to go to the market without their husbands’ permission and the women’s autonomy to go to the doctor without their husbands’ permission.3.3.6Unit of analysis issuesWhere the standard error did not take clustering of outcomes into account in the estimation of standard errors (that is, where the outcome variables were likely to be clustered at a higher level of aggregation than the individual or household level but this was not taken into consideration in the estimation of the standard errors and confidence intervals), we used adjusted standard errors. For these studies with a risk of unit of analysis error, we applied corrections to the standard errors and confidence intervals using the variance inflation factor (Higgins & Green, 2011):?���??����� = ?�����??����� � �× √(1 + (� − 1) × ?��)Here, m is the number of observations per cluster and ICC is the intracluster correlation coefficient.