350LectureO_ChiSq_Student

350LectureO_ChiSq_Student - Lecture O: Chi-Square Tests...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture O: Chi-Square Tests Text Section 8.3 In this lecture, we will explore hypothesis tests for Categorical Data. (1) Bivariate Data: tests for 2-way contingency tables (both variables are categorical) (2) Univariate Data Bivariate Categorical Data: tests for 2-way contingency tables Example: A medical study examined whether people had a particular risk factor or not (e.g., smoking, family history) and whether or not they had the disease (e.g., a particular cancer, diabetes). 100 people were included in the study. Risk Factor Disease No Disease Total 12 43 55 Present 7 38 45 Absent 19 81 100 Total: Chi-Squared Test (Also called a test for Homogeneity) Null Hypothesis: the row variable is independent from the column variable (categorical data). Alternative hypothesis: the row and column variables are not independent. For a contingency table with r rows and c columns, the test statistic χ =∑ 2 ( observed - expected ) 2 expected has a chi-squared distribution with (r-1)(c-1) degrees of freedom. Observed is the observed cell counts Expected is the expected cell counts if the row and column variables were independent. p-values: see Appendix Table VII Alternative phrasing of null and alternative hypotheses: Null hypothesis: the r populations are homogeneous with respect to the c categories Alternative: The populations are not homogeneous, for at least one of the c categories, the proportions are not identical for all populations. Knapp Stat 350 Spring 2009 Lecture O: Chi-Square Tests Page 1 of 1 Continuing the example... Observed Proportions: Risk Factor Present Absent Total: Disease No Disease Total Expected Proportions if the presence of the disease and the presence of the risk factor are independent Risk Factor Disease No Disease Total Present Absent Total: Expected Counts if the presence of the disease and the presence of the risk factor are independent Total Risk Factor Disease No Disease Present Absent Total: Observed Counts: Risk Factor Disease No Disease Total 12 43 55 Present 7 38 45 Absent 19 81 100 Total: Calculating Expected Counts for a given Cell = (Row Total)(Column Total)/(Table Total) Knapp Stat 350 Spring 2009 Lecture O: Chi-Square Tests Page 2 of 2 Note that the smallest possible value of χ2 is 0. This occurs when the observed counts/frequencies are equal to the expected counts/frequencies. This gives the most support for the null hypothesis of independence of the row and column variables. Large chi-square values occur when the observed frequencies are far from the frequencies that would be expected under the null hypothesis of independence. Therefore we reject the null hypothesis of independence when chi-square statistic is large. Ex. A study was conducted to investigate a possible relationship between smoking and socioeconomic status (SES). 356 people were randomly selected to be surveyed. The subjects were classified as either of low, middle, or high SES and whether they were a current smoker, former smoker, or if they never smoked. The results are given in the two-way table below. SES Middle High current 53 41 22 former 90 30 21 never smoker Low 68 22 9 a. What are the null and alternative hypotheses? H0: Ha: b. What is the critical value of the test statistic (α = 0.05) Knapp Stat 350 Spring 2009 Lecture O: Chi-Square Tests Page 3 of 3 Ex. continued c. Assuming there is no relationship between SES and smoking status, what would you expect the cell counts to be? Low SES Middle High smoker current former never d. What is the value of the test statistic? χ =∑ 2 (o − e) 2 e e. What is the p-value of this test? f. Conclusion Knapp Stat 350 Spring 2009 Lecture O: Chi-Square Tests Page 4 of 4 Conceptually, what are we doing in that example? Chi-squared test is generally considered appropriate if 2×2 table: the expected counts in all cells is ≥5 Table larger than 2×2: (Test becomes more accurate as cell counts increase) Average of the cell counts is 5 or more Smallest expected count is at least 1 <20% of cells have expected counts <5 Knapp Stat 350 Spring 2009 Lecture O: Chi-Square Tests Page 5 of 5 Simpson's Paradox – an association or comparison that holds for several groups can disappear or even reverse when the data are combined into a single group. Ex. (from Sandy L. Zabell, 1989) Applicants for graduate admission to the University of California, Berkeley, Fall 1973 Gender Admitted Denied Total % Admitted 3,738 4,704 8,442 44% Male 1,494 2,827 4,321 35% Female A breakdown by department showed that admission rates for women were comparable to the admission rates of men and in some departments the admission rates for women were even substantially higher. Why does this occur? Knapp Stat 350 Spring 2009 Lecture O: Chi-Square Tests Page 6 of 6 Univariate Categorical Data A chi-square test can be conducted on univariate categorical data when there is some a priori expectation of frequencies. When there are k categories, the test statistic χ =∑ 2 Ex. ( observed - expected ) expected 2 has a chi-squared distribution with k-1 degrees of freedom. Is a die fair? I took a die from a board game, and rolled it 35 times, recording the number that came up each time (shown in the table below). Test whether this die is fair at the 0.05 significance level. # on Die 1 2 3 4 5 6 Total Count 2 4 10 7 10 2 35 H0: Ha: 2 What is the critical value for this test? χ critical = Knapp Stat 350 Spring 2009 Lecture O: Chi-Square Tests Page 7 of 7 Ex. In genetics, if two traits are not linked and each is controlled by simple dominance, the outcome of a dihybrid cross is expected to have a 9:3:3:1 ratio. In Drosophila melanogaster, (+) eyes is dominant to sepia (se) colored eyes (wild-type eyes are red). Wild-type (+) wings are dominant to vestigial (vg) wings. In order to test verify that the eyecolor and wing-shape traits are not genetically linked (i.e., statistically independent), a dihybrid cross was conducted: F1 generation flies (se/+, vg/+) were crossed, and the following offspring were observed. eye & wing type + + se + + vg se vg Observed 164 46 48 12 H0: Ha: 2 What is the critical value for this test? χ critical = eye & wing type Observed Expected (o − e) ++ 164 se + 46 + vg 48 se vg 12 Total 270 2 e Knapp Stat 350 Spring 2009 Lecture O: Chi-Square Tests Page 8 of 8 ...
View Full Document

This note was uploaded on 02/16/2010 for the course MA 350 taught by Professor Sellke during the Spring '10 term at Purdue University-West Lafayette.

Ask a homework question - tutors are online