# note15 - STAT5044 Regression and Anova Inyoung Kim 1 48...

STAT5044: Regression and Anova Inyoung Kim 1 / 48

Outline 1 Categorical data analysis 2 Three measures of relationship between categorical variables 3 Testing Independent in two way contingency table 2 / 48
Describing Contingency Tables Introduce tables that display relationships between categorical variables. Define parameters that summarize their association. Parameters are used to compare groups on the proportions to responses in the outcome categories. odds ratio has special importance, appearing as a parameter in models discussed later. Primary focus is present parameters for nominal and ordinal multicategory variables. 3 / 48

Contingency Tables Let X and Y denote two categorical response variables, X with I categories and Y with J categories. Classifications of subjects on both variables have IJ possible combinations. The response ( X , Y ) of a subject chosen randomly from some population have a probability distribution. 4 / 48
Contingency Tables A rectangular table having I rows for categories of X and J columns for categories of Y displays this distribution. The cells of the table represent the IJ possible outcomes. When the cells contain frequency counts of outcomes for a sample, the table is called a contingency table , a term introduced by Karl Pearson (1904). Another name is cross-classification table A contingency table with I rows and J columns is called an I × J table. 5 / 48

Contingency Tables: Example This table is a 2 × 3 contingency table Myocardial Infarction Fatal Nonfatal No Attack Attack Attack Placebo 18 171 10,845 Aspirin 5 99 10,933 This table is from a report on the relationship between aspirin use and heart attacks by the Physicians’ Health Study Research Group at Harvard Medical School. Of the 11,034 physicians taking a placebo, 18 suffered fatal heart attacks over the course of the study, whereas of the 11,037 taking aspirin, 5 had fatal heart attacks. Question: Does Aspirin cause more Fatal attack than Placebo? 6 / 48
Independence of Categorical Variables When both variables are response variables, descriptions of the association can use their joint distribution, the conditional distribution of Y given X, or the conditional distribution of X given Y. The conditional distribution of Y given X relates to the joint distribution by π j | i = π ij π i + , all i and j Two categorical response variables are defined to be independent if all joint probabilities equal the product of their marginal probabilities, π ij = π i + π + j for i = 1 ,..., I and j = 1 ,..., J . 7 / 48

Notation Column Row 1 2 Total 1 π 11 π 12 π 1 + ( π 1 | 1 ) ( π 2 | 1 ) (1.0) 2 π 21 π 22 π 2 + ( π 1 | 2 ) ( π 2 | 2 ) (1.0) Total π + 1 π + 2 1.0 Notation for joint, conditional, and marginal distributions for the 2 × 2 case. The cell frequencies are denoted { n ij } and n = i j n ij is the total sample size. Thus, p ij = n ij / n .
