Lecture 11
Nancy Pfenning Stats 1000
Chapter 6: Relationships Between Categorical Variables
Some variables are categorical by nature (eg. sex, race, major, political party); others are created by grouping
quantitative variables into classes (eg. age as child, adolescent, young adult, etc.). We analyze categorical
data by recording counts or percents of cases occurring in each category. In this course, we mainly consider
just two categorical variables at a time.
Example
We can construct a two-way table (also called a contingency table) showing the relationship
between gender (row variable) and lenswear (column variable) of statistics class members during
a previous semester.
None
Glasses
Contacts
Total
Male
65
36
37
138
Female
110
32
91
233
Total
175
68
128
371
One natural question to ask is, which group tends to wear glasses more, males or females? The
counts
are quite close for males and females (36 vs. 32) but since there are relatively few males
in the class, the
percentage
is actually much higher for the males. Thus, to compare lenswear
of males and females, since there are way more females in the classes than males, it is better to
report percentages instead of counts, concentrating on one gender category at a time. This tells
the
conditional distribution
of lenswear, given gender. It suggests we are thinking of gender
as the explanatory variable.
NONE
GLASSES
CONTACTS
TOTAL
Cond.dist.of lens,
given male
65
138
= 47%
36
138
= 26%
37
138
= 27%
100%
Cond.dist.of lens,
given female
110
233
= 47%
32
233
= 14%
91
233
= 39%
100%
To use a bar graph to display a conditional distribution, label the horizontal axis with the
explanatory variable—in this case, males and females.
Over the male label would be bars of
height 47%, 26%, and 27%, for percentages wearing none, contacts, or glasses. Over the female
label would be bars of height 49%, 14%, and 39%, for percentages wearing none, contacts, or
glasses. This impresses on us visually the di±erence between males and females: males have a
tendency to wear glasses, females to wear contacts.
Alternatively, (but perhaps less intuitively) we may choose to think of lenswear as the explanatory
variable and consider the distribution of gender separately for each lens category:
Cond.Dist.of Gender,
Cond.Dist.of Gender,
Cond.Dist.of Gender,
Lenswear
given none
given glasses
given contacts
Male
65
175
= 37%
36
68
= 53%
37
128
= 29%
Female
110
175
= 63%
32
68
= 47%
91
128
= 71%
TOTAL
100%
100%
100%
We could display this distribution by labeling three lenswear categories on the horizontal axis,
and drawing bars with heights to represent the percentages of males and females in each.
47