ch 8 - Stat 0302B Business Statistics Spring 2010-2011...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Stat 0302B Business Statistics Spring 2010-2011 Chapter VIII Categorical Data Analysis and Chi- Square Tests § 8.1 Introduction This chapter introduces some techniques to evaluate categorical or count data. Data sets in which observations fall into one of a number of mutually exclusive categories must be analysed with special protocols, the “ Goodness-of-fit ” ( GOF ) tests. The term goodness-of-fit refers to the comparison of some observed sample data with a theoretical distribution. Example 8.1 An office building contains two entrances. Theoretical distribution of choice of entrance : % 75 1 = p , % 25 2 = p A sample of 200 persons entering the building was randomly selected, in which 167 had chosen entrance 1 and 33 had chosen entrance 2. Does the theoretical distribution fit the observed data well? The variable we measured on each subject is binary (with just two categories). Hence we can simply use the Z-test for proportion described in Chapter 6. 25 . 0 , 75 . 0 : 2 1 0 = = p p H vs not : 0 1 H H The hypotheses are equivalent to 75 . 0 : 1 0 = p H vs 75 . 0 : 1 1 p H Sample proportion: % 5 . 83 200 167 ˆ 1 = = p Test statistic for the two-sided test : 776 . 2 200 25 . 0 75 . 0 75 . 0 835 . 0 = × = Z Critical value for the two-sided test with 05 . 0 = α : 96 . 1 025 . 0 = Z Since 96 . 1 776 . 2 > = Z , is rejected at 5% significance level, i.e. the observed data showed a significant deviation from the theoretical distribution. 0 H P.180
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Stat 0302B Business Statistics Spring 2010-2011 This test, however, is applicable only for binary data. Sometimes our measured variable may be nominal/ordinal with more than two categories. Example 8.2 Suppose that the office building contains three entrances. Theoretical distribution of choice : % 25 1 = p , % 50 2 = p , % 25 3 = p Observed choices of 1000 persons: Choice Entrance 1 Entrance 2 Entrance 3 Total Count 347 480 173 1000 The Chi-Square Goodness of Fit test is commonly used for such problem. § 8.2 The basic Chi-Square test statistic The general form of the Chi-square test statistic is expressed as () = = E E O frequency cell Expected frequency cell Expected frequency cell Observed Q 2 2 The expected cell frequencies are calculated based on the distribution stated in the null hypothesis. For instance, in Example 8.2 the theoretical distribution is described by a specific list of probabilities: 25 . 0 , 5 . 0 , 25 . 0 : 3 2 1 0 = = = p p p H If is true, then in a random sample of 1000 subjects, we will expect to observe subjects in category 1, 0 H 1000 × 250 25 . 0 = 500 5 . 0 1000 = × subjects in category 2, subjects in category 3. Therefore the expected cell frequencies are , 250 25 . 0 = 250 1 = 1000 × E 500 2 = E , 250 3 = E . We can compare these expected cell frequencies with the observed cell frequencies 347 1 = O 2 , 480 = O , using 173 3 = O ( ) ( ) 152 . 62 250 250 173 500 500 480 250 250 347 2 2 2 2 = + + + = = L E E O Q P.181
Background image of page 2
Stat 0302B Business Statistics Spring 2010-2011 The value of Q provides a measure on the deviation of the observed data from the theoretical distribution (null hypothesis).
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 05/11/2011.

Page1 / 17

ch 8 - Stat 0302B Business Statistics Spring 2010-2011...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online