Unformatted text preview: Lecture O: ChiSquare Tests
Text Section 8.3
In this lecture, we will explore hypothesis tests for Categorical Data.
(1) Bivariate Data: tests for 2way contingency tables (both variables are categorical)
(2) Univariate Data
Bivariate Categorical Data: tests for 2way contingency tables
Example: A medical study examined whether people had a particular risk factor or not (e.g.,
smoking, family history) and whether or not they had the disease (e.g., a particular cancer,
diabetes). 100 people were included in the study.
Risk Factor Disease No Disease Total
12
43
55
Present
7
38
45
Absent
19
81 100
Total: ChiSquared Test (Also called a test for Homogeneity)
Null Hypothesis: the row variable is independent from the column variable (categorical data).
Alternative hypothesis: the row and column variables are not independent.
For a contingency table with r rows and c columns, the test statistic χ =∑
2 ( observed  expected ) 2 expected
has a chisquared distribution with (r1)(c1) degrees of freedom.
Observed is the observed cell counts
Expected is the expected cell counts if the row and column variables were independent. pvalues: see Appendix Table VII
Alternative phrasing of null and alternative hypotheses:
Null hypothesis: the r populations are homogeneous with respect to the c categories
Alternative: The populations are not homogeneous, for at least one of the c categories, the
proportions are not identical for all populations. Knapp Stat 350
Spring 2009 Lecture O: ChiSquare Tests
Page 1 of 1 Continuing the example...
Observed Proportions:
Risk Factor
Present
Absent
Total: Disease No Disease Total Expected Proportions if the presence of the disease and the presence of the risk factor are
independent
Risk Factor
Disease
No Disease
Total
Present
Absent
Total:
Expected Counts if the presence of the disease and the presence of the risk factor are independent
Total
Risk Factor
Disease
No Disease
Present
Absent
Total:
Observed Counts:
Risk Factor Disease No Disease Total
12
43
55
Present
7
38
45
Absent
19
81 100
Total:
Calculating Expected Counts for a given Cell = (Row Total)(Column Total)/(Table Total) Knapp Stat 350
Spring 2009 Lecture O: ChiSquare Tests
Page 2 of 2 Note that the smallest possible value of χ2 is 0.
This occurs when the observed counts/frequencies are equal to the expected counts/frequencies.
This gives the most support for the null hypothesis of independence of the row and column
variables. Large chisquare values occur when the observed frequencies are far from the
frequencies that would be expected under the null hypothesis of independence.
Therefore we reject the null hypothesis of independence when chisquare statistic is large. Ex. A study was conducted to investigate a possible relationship between smoking and socioeconomic status (SES). 356 people were randomly selected to be surveyed. The subjects
were classified as either of low, middle, or high SES and whether they were a current
smoker, former smoker, or if they never smoked. The results are given in the twoway table
below.
SES
Middle High current 53 41 22 former 90 30 21 never smoker Low 68 22 9 a. What are the null and alternative hypotheses?
H0:
Ha:
b. What is the critical value of the test statistic (α = 0.05) Knapp Stat 350
Spring 2009 Lecture O: ChiSquare Tests
Page 3 of 3 Ex. continued
c. Assuming there is no relationship between SES and smoking status, what would you
expect the cell counts to be? Low SES
Middle High smoker current former never d. What is the value of the test statistic? χ =∑
2 (o − e) 2 e e. What is the pvalue of this test? f. Conclusion Knapp Stat 350
Spring 2009 Lecture O: ChiSquare Tests
Page 4 of 4 Conceptually, what are we doing in that example? Chisquared test is generally considered appropriate if
2×2 table: the expected counts in all cells is ≥5
Table larger than 2×2: (Test becomes more accurate as cell counts increase)
Average of the cell counts is 5 or more
Smallest expected count is at least 1
<20% of cells have expected counts <5 Knapp Stat 350
Spring 2009 Lecture O: ChiSquare Tests
Page 5 of 5 Simpson's Paradox – an association or comparison that holds for several groups can disappear or
even reverse when the data are combined into a single group.
Ex. (from Sandy L. Zabell, 1989) Applicants for graduate admission to the University of
California, Berkeley, Fall 1973
Gender Admitted Denied Total % Admitted
3,738
4,704 8,442
44%
Male
1,494
2,827 4,321
35%
Female
A breakdown by department showed that admission rates for women were comparable to the
admission rates of men and in some departments the admission rates for women were even
substantially higher. Why does this occur? Knapp Stat 350
Spring 2009 Lecture O: ChiSquare Tests
Page 6 of 6 Univariate Categorical Data
A chisquare test can be conducted on univariate categorical data when there is some a priori
expectation of frequencies. When there are k categories, the test statistic χ =∑
2 Ex. ( observed  expected )
expected 2 has a chisquared distribution with k1 degrees of freedom. Is a die fair? I took a die from a board game, and rolled it 35 times, recording the number
that came up each time (shown in the table below). Test whether this die is fair at the 0.05
significance level.
# on Die 1 2 3 4 5 6 Total
Count
2 4 10 7 10 2 35
H0:
Ha:
2
What is the critical value for this test? χ critical = Knapp Stat 350
Spring 2009 Lecture O: ChiSquare Tests
Page 7 of 7 Ex. In genetics, if two traits are not linked and each is controlled by simple dominance, the
outcome of a dihybrid cross is expected to have a 9:3:3:1 ratio. In Drosophila
melanogaster, (+) eyes is dominant to sepia (se) colored eyes (wildtype eyes are red).
Wildtype (+) wings are dominant to vestigial (vg) wings. In order to test verify that the eyecolor and wingshape traits are not genetically linked (i.e., statistically independent), a
dihybrid cross was conducted: F1 generation flies (se/+, vg/+) were crossed, and the
following offspring were observed.
eye & wing type + + se + + vg se vg
Observed
164 46
48
12
H0: Ha:
2
What is the critical value for this test? χ critical = eye & wing type
Observed
Expected (o − e) ++
164 se +
46 + vg
48 se vg
12 Total
270 2 e Knapp Stat 350
Spring 2009 Lecture O: ChiSquare Tests
Page 8 of 8 ...
View
Full Document
 Spring '10
 SELLKE
 ChiSquare Test, Null hypothesis, Statistical hypothesis testing, Chisquare distribution, Pearson's chisquare test, chisquare tests

Click to edit the document details