STAT101_Chap8

# STAT101_Chap8 - 8 Association between Categorical Variables...

This preview shows pages 1–9. Sign up to view the full content.

8. Association between Categorical Variables Suppose both response and explanatory variables are categorical. (For comparing means in Chap. 7, response variable is quantitative, explanatory variable is categorical. Chap. 9 considers both quantitative .) There is association if the population conditional distribution for the response variable differs among the categories of the explanatory variable Example: Contingency table on happiness cross- classified by family income (data from 2006 GSS)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Happiness Income Very Pretty Not too Total --------------------------------------------- Above 272 (44%) 294 (48%) 49 (8%) 615 Average 454 (32%) 835 (59%) 131 (9%) 1420 Below 185 (20%) 527 (57%) 208 (23%) 920 ---------------------------------------------- Response : Happiness, Explanatory : Income The sample conditional distributions on happiness vary by income level, but can we conclude that this is also true in the population?
Guidelines for Contingency Tables Show sample conditional distributions: percentages for the response variable within the categories of the explanatory variable. Clearly define variables and categories. If display percentages but not the cell counts, include explanatory total sample sizes, so reader can (if desired) recover all the cell count data.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Statistical independence (no association) : Population conditional distributions on one variable the same for all categories of the other variable Statistical dependence (association) : Conditional distributions are not all identical Example of statistical independence : Happiness Income Very Pretty Not too ----------------------------------------- Above Average Below
Chi-Squared Test of Independence (Karl Pearson, 1900) • Tests H 0 : The variables are statistically independent • H a : The variables are statistically dependent Intuition behind test statistic: Summarize differences between observed cell counts and expected cell counts • Notation: f o = observed frequency (cell count) f e = expected frequency r = number of rows in table, c = number of columns

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Expected frequencies ( f e ): Have identical conditional distributions. Those distributions are same as the column (response) marginal distribution of the data. Have same marginal distributions (row and column totals) as observed frequencies Computed by f e = (row total)(column total)/n
Happiness Income Very Pretty Not too Total -------------------------------------------------- Above 272 (189.6) 294 (344.6) 49 (80.8) 615 Average 454 (437.8) 835 (795.8) 131 (186.5) 1420 Below 185 (283.6) 527 (515.6) 208 (120.8) 920 -------------------------------------------------- Total 911 1656 388 2955 e.g., first cell has f e = f e values are in parentheses in this table

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Chi-Squared Test Statistic • Summarize closeness of {f o } and {f e } by with sum is taken over all cells in the table. • When H
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 07/14/2011 for the course STA 101 taught by Professor Alan during the Fall '10 term at University of Florida.

### Page1 / 33

STAT101_Chap8 - 8 Association between Categorical Variables...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online