Unformatted text preview: M316 Chapter 6 Dr. Berg TwoWay Tables We have looked at relationships in which at least the response variable is quantitative. Now we look at relationships between two categorical variables. Some variables are inherently categorical: sex, race, eye color, etc. Others can be created by grouping quantitative values into classes. To analyze categorical data, we use counts or percents of individuals that fall into various categories. Example (6.1) College Students Here is a twoway table (describes two categorical variables) of describing sex and age of college students. Age group is the row variable and sex is the column variable. The totals are written in the margins. Marginal Distributions In the previous example, the distributions of the two variables are recorded separately in the margins: the age group distribution is in the totals in the right margin, and the sex distribution is in the totals in the bottom margin. These are the marginal distributions for the two variables in the twoway table. If a twoway table does not have these, you may need to generate the marginal distributions yourself. Note that, since these numbers are rounded to the nearest thousand, the columns and rows do not necessarily add to exactly the totals in the margins. This is roundoff error. Percents are often more informative than counts. We can find the marginal distribution for age groups in terms of percents by dividing each row total by the table total and converting to a percent. 1 M316 Chapter 6 Dr. Berg Example (6.2) Calculating a Marginal Distribution The percent of college students in each age range are: 150 15 to 17 = 0.9% 16,639 10,365 18 to 24 = 62.3% 16,639 3,494 25 to 34 = 21.0% and 16,639 2,630 35 and up = 15.8% . 16,639 The histogram for this follows. Exercise Here is a table of Suicides from 1983 categorized by sex and method. Find the marginal distributions by totals and percents. Method\Sex Male Female Firearms 13959 2641 Poison 3148 2469 Hanging 3222 709 Other 1457 690 The row variable is "method" and the column variable is "sex". What are the overall patterns? 2 M316 Conditional Distributions Chapter 6 Dr. Berg If we take the data from one row or one column we have a conditional distribution. Each category for a variable has a separate distribution. It is called conditional because it is conditioned on the fact that the individuals fall into that category. Example (6.3) Conditional Distribution of Sex Given Age If we know that a college student is 18 to 24 years old, we need look only at that row of the twoway table. To find the distribution of sex among only students in this age group, divide each count in the row by the row total, which is 10,365. The conditional distribution of sex given that a student is 18 to 24 years old follows. Female Male Percent of 18 to 24 age group 54.7 45.3 Example (6.4) Women Among College Students Let's follow the fourstep process starting with a practical question of interest to college administrators. State: The proportion of college students who are older than the traditional 18 to 24 years is increasing. How does the participation of women in higher education change as we look at older students? Formulate: Calculate and compare the conditional distributions of sex for college students in several age groups. Solve: Comparing conditional distributions reveals the nature of the association between sex and age of college students. Find the conditional distributions in percents for each age group. 3 M316 Chapter 6 Dr. Berg Female Male Percent of 15 to 17 Age Group 59.3 40.7 Percent of 18 to 24 Age Group 54.7 45.3 Percent of 25 to 34 Age Group 54.5 45.5 Percent of 35 or Older Age Group 63.1 36.9 Here is a bar graph comparing the percent of female college students in each age group. Conclude: Women are a majority of college students in all age groups but are somewhat more predominant among students 35 years or older. Women are more likely than men to return to college after working for a number of years. Exercise Perform a similar analysis on the data for suicides in 1983. 4 M316 Simpson's Paradox Chapter 6 Dr. Berg An association or comparison that holds for all of several groups can reverse direction when data are combined into a single group. This reversal is called Simpson's paradox. A lurking variable can be involved. Example (6.6) Do Medical Helicopters Save Lives? Here are hypothetical data for transport of accident victims to illustrate a practical difficulty. Helicopter Road Victim Died 64 260 Victim Survived 136 840 Total 200 1100 We see that 32% of helicopter patients died, but only 24% of the others died. Why? The explanation is that the helicopter is sent mostly to serious accidents. If we ungroup the data, we see this more clearly. Serious Accidents: Helicopter Road Victim Died 48 60 Victim Survived 52 40 Total 100 100 Less Serious Accidents: Helicopter Road Victim Died 16 200 Victim Survived 84 800 Total 100 1000 In both cases helicopter patients have a higher survival rate. Exercise Here are flight delay data for two airlines flying out of five airports. Which airline has the better ontime record? Alaska Airlines America West On Time Delayed On Time Delayed Los Angeles 497 62 694 117 Phoenix 221 12 4840 415 San Diego 212 20 383 65 San Francisco 503 102 320 129 Seattle 1841 305 201 61 5 ...
View Full Document
This note was uploaded on 09/14/2009 for the course CH 310 N taught by Professor Blocknack during the Fall '08 term at University of Texas.
- Fall '08