as the one we observed. Confidence Intervals for Risk and Risk Difference From each cell in a contingency table and the associate row sum we can obtain a risk estimate (just divide the cell count by the row sum). We could do the same for columns. These estimates are estimated proportions, and each is a proportion parameter from a binomial (if we have more than 2 possibilities, we would be dealing with multinomials, but let’s stick with binomials).The binomial distribution goes to a normal as the number of trials gets large, so we can get approximate confidence intervals for the risks (which are just binomial proportions) based on a normal approximation. We could also use exact formulas for binomials to construct the intervals. This is similar in concept to obtaining confidence intervals for a mean like we did with t tests, but now we’re obtaining a confidence interval for a proportion instead (so the distributional properties come from a binomial instead). Just like we considered location differences for two populations (e.g. difference between north and south mortality rates) and obtained confidence intervals for those differences, we could consider differences of risks. For instance, for a chosen row we could look at the difference of risks for two

columns to see if there is a significant difference between the column risks for the group represented by
that row value. So returning to the idea of political parties and political stance, we could look at the
difference of proportions of Republicans in favor and Democrats in favor to see if Republicans have a
significantly higher probability of being in favor than Democrats do.
Example
For the grouped
water
data set, with rows being the mortality groups and columns the calcium
concentration groups. For the low calcium group column, the sample proportion of low mortality
observations which had low calcium concentrations will be the estimate of the risk of having low water
calcium given that the mortality rate is low. The sample proportion of high mortality observations which
had low calcium concentrations will be the estimate of the risk of having low water calcium given that
the mortality rate is high.
We get confidence intervals for these two proportions either by using the asymptotic approximation of
binomials as normal, or by using the interval (labeled as exact in the SAS table) based on the binomial
distribution. The approximate result will typically be fine if samples aren’t too smal
l, and we can obtain
intervals for risk differences based on asymptotic approximation. Risk difference intervals contain 0 will
tell us that the risk is not significantly different for the two row categories. A completely negative
interval would say that the second row category has a significantly higher risk of being in that column
category. A completely positive interval would say that the first row category is at greater risk of being in
that column category.

#### You've reached the end of your free preview.

Want to read all 7 pages?

- Spring '08
- Muyot,M
- Normal Distribution, Pearson's chi-square test, Fisher's exact test