STAT101_Chap8 - 8. Association between Categorical...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
8. Association between Categorical Variables Suppose both response and explanatory variables are categorical. (For comparing means in Chap. 7, response variable is quantitative, explanatory variable is categorical. Chap. 9 considers both quantitative .) There is association if the population conditional distribution for the response variable differs among the categories of the explanatory variable Example: Contingency table on happiness cross- classified by family income (data from 2006 GSS)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Happiness Income Very Pretty Not too Total --------------------------------------------- Above 272 (44%) 294 (48%) 49 (8%) 615 Average 454 (32%) 835 (59%) 131 (9%) 1420 Below 185 (20%) 527 (57%) 208 (23%) 920 ---------------------------------------------- Response : Happiness, Explanatory : Income The sample conditional distributions on happiness vary by income level, but can we conclude that this is also true in the population?
Background image of page 2
Guidelines for Contingency Tables Show sample conditional distributions: percentages for the response variable within the categories of the explanatory variable. Clearly define variables and categories. If display percentages but not the cell counts, include explanatory total sample sizes, so reader can (if desired) recover all the cell count data.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Statistical independence (no association) : Population conditional distributions on one variable the same for all categories of the other variable Statistical dependence (association) : Conditional distributions are not all identical Example of statistical independence : Happiness Income Very Pretty Not too ----------------------------------------- Above Average Below
Background image of page 4
Chi-Squared Test of Independence (Karl Pearson, 1900) • Tests H 0 : The variables are statistically independent • H a : The variables are statistically dependent Intuition behind test statistic: Summarize differences between observed cell counts and expected cell counts • Notation: f o = observed frequency (cell count) f e = expected frequency r = number of rows in table, c = number of columns
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Expected frequencies ( f e ): Have identical conditional distributions. Those distributions are same as the column (response) marginal distribution of the data. Have same marginal distributions (row and column totals) as observed frequencies Computed by f e = (row total)(column total)/n
Background image of page 6
Happiness Income Very Pretty Not too Total -------------------------------------------------- Above 272 (189.6) 294 (344.6) 49 (80.8) 615 Average 454 (437.8) 835 (795.8) 131 (186.5) 1420 Below 185 (283.6) 527 (515.6) 208 (120.8) 920 -------------------------------------------------- Total 911 1656 388 2955 e.g., first cell has f e = f e values are in parentheses in this table
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Chi-Squared Test Statistic • Summarize closeness of {f o } and {f e } by with sum is taken over all cells in the table. • When H
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 33

STAT101_Chap8 - 8. Association between Categorical...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online