8. Association between categorical variables

8. Association between categorical variables - 8....

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 8. Association between Categorical Variables Suppose both response and explanatory variables are categorical, with any number of categories for each (Chap. 9 considers both variables quantitative .) There is association between the variables if the population conditional distribution for the response variable differs among the categories of the explanatory variable. Example: Contingency table on happiness cross- classified by family income (data from 2006 GSS) Happiness Income Very Pretty Not too Total--------------------------------------------- Above 272 (44%) 294 (48%) 49 (8%) 615 Average 454 (32%) 835 (59%) 131 (9%) 1420 Below 185 (20%) 527 (57%) 208 (23%) 920 ---------------------------------------------- Response : Happiness (happy in GSS) Explanatory : Relative family income (finrela in GSS) The sample conditional distributions on happiness vary by income level, but can we conclude that this is also true in the population? Strong or weak association? Guidelines for Contingency Tables Show sample conditional distributions: percentages for the response variable within the categories of the explanatory variable. ( Find by dividing the cell counts by the explanatory category total and multiplying by 100. Percents on response categories will add to 100.) Clearly define variables and categories. If display percentages but not the cell counts, include explanatory total sample sizes, so reader can (if desired) recover all the cell count data. (I use rows for explanatory var., columns for response var.) Independence & Dependence Statistical independence (no association): Population conditional distributions on one variable the same for all categories of the other variable Statistical dependence (association): Population conditional distributions are not all identical Example of statistical independence : Happiness Income Very Pretty Not too ----------------------------------------- Above 32% 55% 13% Average 32% 55% 13% Below 32% 55% 13% Chi-Squared Test of Independence (Karl Pearson, 1900) Tests H : The variables are statistically independent H a : The variables are statistically dependent Intuition behind test statistic: Summarize differences between observed cell counts and expected cell counts (what is expected if H 0 true) Notation: f o = observed frequency (cell count) f e = expected frequency r = number of rows in table, c = number of columns Expected frequencies ( f e ): Have identical conditional distributions. Those distributions are same as the column (response) marginal distribution of the data. Have same marginal distributions (row and column totals) as observed frequencies Computed by f e = (row total)(column total)/n Happiness Income Very Pretty Not too Total-------------------------------------------------- Above 272 (189.6) 294 (344.6) 49 (80.8) 615 Average 454 (437.8) 835 (795.8) 131 (186.5) 1420 Below 185 (283.6) 527 (515.6) 208 (120.8) 920 --------------------------------------------------...
View Full Document

Page1 / 36

8. Association between categorical variables - 8....

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online