This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: STAT503 Fall 2008 Lecture Notes: Chapter 10 1 Chapter 10: Analysis of Categorical Data April 6, 2009 Our observations fall into categories instead of being continuous variables. We count the number of observations falling into each category. As usual we assume that the sample points are independent . If there are only two categories , the number of observations in one category has a binomial distribution. If there is more than two categories , we can focus on one category and group others together (still binomial) or, we can define probabilities for all categories ( p 1 , p 2 , ... ). We will use a new distribution called a 2-distribution (chi-squared). It is another cousin to the Normal(0 , 1) distribution. Definition: If Z 1 ,Z 2 ,...,Z k are independent Normal(0 , 1) random variables then k i =1 Z 2 i has a 2 k distribution (a chi-squared distribution with k degrees of free- dom). There are several different tests which use the 2 distribution to determine critical values. They differ in their setup just like there are many different kinds of t-tests . [Draw 2 curve.] 10.1 The 2 Goodness of Fit Test In this section we consider testing if the observed frequencies for a categorical variable are compatible with a null hypothesis that specifies the probabilities of the categories . Thus, we study if the data seem to fit the hypothetical distribution. For example the question Is this a fair coin? may be answered by this method. Description of such test for categorical data, based on a random sample of size n . I Need hypothesized values for the population proportions p i for each cate- gory. These are specified in or implied by the given problem. Chapter10.tex; Last Modified: April 6, 2009 (W. Sharabati) STAT503 Fall 2008 Lecture Notes: Chapter 10 2 I We calculate the expected number of observations in each category un- der H . We will use the formula np i (number of observations times the population proportion). I The test is only approximate and works when the sample size is large . The expected number in each category should be at least 5 . The Test Statistic The test statistic is computed as follows: I X 2 s = k X i =1 (Observed- Expected) 2 Expected = k X i =1 ( O- E ) 2 E . I Under the null hypothesis X 2 s has approximately 2 k- 1 distribution . I Table 9 gives critical values for the 2 k distribution. I Rejection rule : Large values of X 2 s lead to the rejection of H . Example In the sweet pea, the allele for purple flower color (P) is dominant to the allele for red flowers ( p ), and the allele for long pollen grains (L) is dominant to the allele for round pollen grains (l). The first group (of grandparents) will be homozygous for the dominant alleles (PPLL) and the second group (of grandparents) will be homozygous for the recessive alleles ( pp ll)....
View Full Document
This note was uploaded on 04/23/2011 for the course STAT 503 taught by Professor Staff during the Spring '08 term at Purdue University-West Lafayette.
- Spring '08