8. Association between categorical variables

# 8. Association between categorical variables - 8...

This preview shows pages 1–8. Sign up to view the full content.

8. Association between Categorical Variables Suppose both response and explanatory variables are categorical. (Chap. 9 considers both quantitative .) There is association if the population conditional distribution for the response variable differs among the categories of the explanatory variable Example: Contingency table on happiness cross- classified by family income (data from 2006 GSS)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Happiness Income Very Pretty Not too Total --------------------------------------------- Above 272 (44%) 294 (48%) 49 (8%) 615 Average 454 (32%) 835 (59%) 131 (9%) 1420 Below 185 (20%) 527 (57%) 208 (23%) 920 ---------------------------------------------- Response : Happiness, Explanatory : Income The sample conditional distributions on happiness vary by income level, but can we conclude that this is also true in the population?
Guidelines for Contingency Tables Show sample conditional distributions: percentages for the response variable within the categories of the explanatory variable. Find by dividing the cell counts by the explanatory category total and multiplying by 100. (Percents on response categories will add to 100) Clearly define variables and categories. If display percentages but not the cell counts, include explanatory total sample sizes, so reader can (if desired) recover all the cell count data. (I use rows for explanatory var., columns for response var.)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Statistical independence (no association) : Population conditional distributions on one variable the same for all categories of the other variable Statistical dependence (association) : Conditional distributions are not all identical Example of statistical independence : Happiness Income Very Pretty Not too ----------------------------------------- Above 32% 55% 13% Average 32% 55% 13% Below 32% 55% 13%
Chi-Squared Test of Independence (Karl Pearson, 1900) • Tests H 0 : The variables are statistically independent • H a : The variables are statistically dependent Intuition behind test statistic: Summarize differences between observed cell counts and expected cell counts (what is expected if H 0 true) • Notation: f o = observed frequency (cell count) f e = expected frequency r = number of rows in table, c = number of columns

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Expected frequencies ( f e ): Have identical conditional distributions. Those distributions are same as the column (response) marginal distribution of the data. Have same marginal distributions (row and column totals) as observed frequencies Computed by f e = (row total)(column total)/n
Income Very Pretty Not too Total -------------------------------------------------- Above 272 (189.6) 294 (344.6) 49 (80.8) 615 Average 454 (437.8) 835 (795.8) 131 (186.5) 1420 Below 185 (283.6) 527 (515.6) 208 (120.8) 920 -------------------------------------------------- Total 911 1656 388 2955 e.g., first cell has f e = 615(911)/2955 = 189.6. f

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 11/20/2011 for the course STATISTICS ST3241 taught by Professor Manwai's during the Spring '11 term at National University of Singapore.

### Page1 / 36

8. Association between categorical variables - 8...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online