Chapter 9 Two categorical variables. Data Analysis and Inference for Two-Way Tables

Topics Important change: We switch from quantitative variables to categorical variables describing relations in two-way tables marginal distributions conditional distributions Simpson’s paradox hypothesis of no association the chi-square test
Perils of aggregation This example was essentially a Three-Way Table with variables: airline, timing, airport. Such tables are often reported as several two- way tables. Think of a book, rather than a page. Adding entries from such elementary tables (“pages”) to get the overall summary (for the “book”) is aggregation and leads to ignoring the third variable (here: airport). This may lead to false general conclusions.
Section 9.2 Inference for Two-Way tables

Hypothesis testing with 2-way tables H 0 : there is no association between the row and column variables (they are independent) H a : there is an association between the row and column variables (they are related or dependent) To test the hypotheses, compare: observed counts vs. expected counts (from actual data) (what is expected by H 0 ) Expected = calculated under the assumption that the null hypothesis is true.
Expected Cell Counts expected count = Here n = total # of observations for the table.

