chi square tests ppt

chi square tests ppt - Chi-Square Tests Chapter 12.1-12.3...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chi-Square Tests Chapter 12.1-12.3 and elsewhere Chi-Square Tests Chi-Square 1 Categorical Data When analyzing continuous data, When underlying theory is from normal dist. underlying If the data represent attributes, theory is If from the binomial. from If there are multiple categories, there is a If multinomial generalization. multinomial We will look at these, but first revisit some We categorical data analysis we’ve already done. done. Chi-Square Tests Chi-Square 2 10.3: Comparing two proportions Two (large) samples of size n1 and n2 Let X1 denote the number in sample 1 that have some characteristic, X2 the same in have sample 2. sample Compute sample proportions p1 = X1/n1 and p2 = X2/n2 and Use these to test the hypothesis about Use population proportions H0: π1 – π2 = 0 population Chi-Square Tests Chi-Square 3 Example Weekend Sampled Merchandise at Merchandise Regular Price Regular During annual During Mega Sale Mega Paid by Paid credit card credit 201 312 Transactions sampled sampled 300 416 Do people use credit cards more frequently during Do sales? From an upscale store's data base, we sampled a number of transactions over two weekends. number Chi-Square Tests Chi-Square 4 Is there a difference? 201 201 p = ------ = .67 1 300 300 312 p = ------ = .75 2 416 How much is significant? The SE (page 355) is .03414 Chi-Square Tests Chi-Square 5 PHStat results Critical values ± Critical ZCALC 1.96 = -2.34 There is more CC There usage during sales. usage Chi-Square Tests Chi-Square 6 12.1: An alternate test It is difficult to generalize this Z-test to a problem It with more than two populations; for example, weekdays and weekends during both sales and regular-priced days (four samples). regular-priced It would be even more difficult to accommodate It multiple categories like payment by cash, check, credit card, debit card and gift card. credit We will look at a test that can handle all these. We But first, let’s apply it to the current example. But Chi-Square Tests Chi-Square 7 Contingency tables These are tables cross-tabulating count These data by two factors (here, these are time period and type of payment). period In general, we are going to see if the rows In differ by the columns, on a proportionate basis. basis. We thus have the time period on the We columns and type of $ on the rows. columns Chi-Square Tests Chi-Square 8 Our example again Time Period Type Of Type Payment Payment Credit Card Cash Total During At Regular During At Mega Sale Prices Mega Prices Total 312 104 201 99 513 203 416 300 716 This is a 2-by-2 table, so there are four “cells” Chi-Square Tests Chi-Square 9 Methodology Method: compute an Method: expected frequency expected for each “cell” and compare it to what we actually observed. observed In each cell, compute the difference In between what was observed and expected. between If the payment methods were used at about If the same frequency in both time periods, the differences would be small. the Chi-Square Tests Chi-Square 10 Overall credit card usage 201 + 312 513 201 p = -------------- = ----- = .71648 Credit 300 + 416 716 Credit Expected during sales = During regular = Cash usage expected = Chi-Square Tests Chi-Square 11 Observed and expected and expected Type Of Type Payment Payment During During Sales Sales Credit Card 312 298.1 312 Cash 104 117.9 104 117.9 Total At Regular At Prices Prices Total 201 214.9 201 513 99 99 203 416 85.1 300 Chi-Square Tests Chi-Square 716 12 The Chi-Square statistic Next, square the error and express it relative to Next, what we expect. what Add these up across all four cells. They call this sum the Chi-Square or χ2 statistic. They (Oi − E i ) 2 χ2 = ∑ Ei i =1 K K is the number of “cells” in the table Oi is the observed count per “cell” Ei is the expected count Chi-Square Tests Chi-Square 13 The Chi-Square Distribution Chisquare with 5 (dash) and 10 (solid) degrees of freedom 0.15 f(x) 0.10 0.05 0.00 0 10 20 30 X Chi-Square Tests Chi-Square 14 Where is upper 5% of distribution? In our text, the Chi-Square table (Table In E.4) is on page 741. E.4) We want to find the value that puts 5% of We the area in the upper tail. the For a 2-by-2 table, we use a χ2 distribution For with one degree of freedom. with The critical value is thus 3.841. Chi-Square Tests Chi-Square 15 The Critical Value (at α = .05) The Critical Values of Chi-Square Upper-Tail Areas ( α ) df 1 2 3 4 5 6 7 8 9 10 0.10 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 0.05 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 0.025 5.024 7.378 9.348 11.143 12.833 14.449 16.013 17.535 19.023 20.483 Chi-Square Tests Chi-Square 0.01 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 0.005 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 16 Is credit card usage the same? Hypotheses Test statistic Decision rule Results Chi-Square Tests Chi-Square 17 PHStat calculation Chi-Square Tests Chi-Square 18 This procedure not needed? True – we can always use the Z test. It actually is the same test; take the Zcalc value and square it to get the Chi-Square value. value. False – it is a useful starting point for False larger problems with more rows and columns. columns. Chi-Square Tests Chi-Square 19 More general procedures Let r = the number of row levels Let c = the number of column levels × c or r-by-c table. Methodology is the same. The Chi-Square Methodology statistic has (r-1)(c-1) d.f. statistic We call it an r We Chi-Square Tests Chi-Square 20 12.2: Multiple populations We still have two row categories but now We there are c populations on the columns there This is a 2-by-c analysis and there are This thus (2-1)(c-1) = c-1 d.f. thus The hypothesis is that the proportions in The each row are the same across columns. each Chi-Square Tests Chi-Square 21 Our example expanded Time Period Type Of Type Payment Payment Weekend Weekday Weekend Weekday Sales Sales Sales Sales Weekend Regular Weekday Weekday Regular Regular Total Credit Credit Card Card 312 289 201 184 986 Cash 104 97 99 61 361 Total 416 386 300 245 1347 Chi-Square Tests Chi-Square 22 Some of the computations 1. 2. 3. 4. Overall proportion by credit card is Overall 986/1347 = .7320 986/1347 Expected during weekend sales is thus . 7320(416) = 304.51 (Observed - Expected) = 7.49 (O-E)2/E = (312-304.51)2/304.51 = .1842 Chi-Square Tests Chi-Square 23 Here, there are (c-1)=(4-1)=3 df Critical Values of Chi-Square Upper-Tail Areas ( α ) df 1 2 3 4 5 6 7 8 9 10 0.10 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 0.05 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 0.025 5.024 7.378 9.348 11.143 12.833 14.449 16.013 17.535 19.023 20.483 Chi-Square Tests Chi-Square 0.01 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 0.005 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 24 Now, is credit card usage similar? Hypotheses Test statistic Decision rule Results Chi-Square Tests Chi-Square 25 In In PHStat PHStat Chi-Square Tests Chi-Square 26 Suppose it were just a little different? Because we accepted H0 we would not go looking for significant differences between time periods. There aren’t any! time Suppose we find a data transcription error. Suppose During weekday sales, 298 (not 289) of the 386 sales were by credit card. the The Chi-Square is now 10.02. We might want to find out what differences We are significant. are Chi-Square Tests Chi-Square 27 The Marascuilo procedure Similar to Tukey-Kramer analysis for a Similar One-Way ANOVA. One-Way It figures out a critical range value, and It any two proportions different by this or more will be called significant. more It only applies to a 2-by-c table. Chi-Square Tests Chi-Square 28 Marascuilo output Conclude? Chi-Square Tests Chi-Square 29 12.3: χ2 test of independence 12.3: The hypothesis is that the row variable is The independent of the column variable. independent The alternative is that there is some kind The of relationship among them. of Under this H0 the expected count per cell is related to its row total and column total is Eij = RiCj / n Chi-Square Tests Chi-Square 30 Data Layout (3 by 4) Col 1 Col 2 Col 3 Col 4 Total Row 1 O11 O12 O13 O14 R1 Row 2 O21 O22 O23 O24 R2 Row 3 O31 O32 O33 O34 R3 Total C1 C2 C3 C4 n Chi-Square Tests Chi-Square 31 Example The human resource manager for a large The firm wants to assess the popularity of three alternative flextime plans among workers in four offices. workers If the plans are arrayed on the rows and If offices across columns, we have r=3 and c=4. c=4. The Chi-Square statistic thus has (r-1)(c1) = (3-1)(4-1) = 2*3 = 6 df. Chi-Square Tests Chi-Square 32 Survey results Office1 Office2 Office3 Office4 Total Plan1 15 32 18 5 70 Plan2 8 29 23 18 78 Plan3 1 20 25 22 68 Total 24 81 66 45 216 Chi-Square Tests Chi-Square 33 Using PHStat α r c Chi-Square Tests Chi-Square 34 That produces this blank table Fill these in Chi-Square Tests Chi-Square 35 Filled In table Chi-Square Tests Chi-Square 36 Results Data Level of Significance Number of Rows Number of Columns Degrees of Freedom 0.05 3 4 6 Results Critical Value 12.59159 Chi-Square Test Statistic 27.135 p -Value 0.000137 Reject the null hypothesis Strong significance Expected frequency assumption is met. Chi-Square Tests Chi-Square 37 Conclusions H0 is rejected, so we can say that some offices had different preferences for the plans than others. plans To dig a little deeper, we can look at the To table of (Oij – Eij ) 2 / Eij values. (O values. Large values here show “extra Large information”. information”. Chi-Square Tests Chi-Square 38 The (Oij – Eij ) / Eij Table The (O 2 (fo-fe)^2/fe 6.706349 1.259524 0.536941 6.297619 0.051282 0.002137 0.029138 0.188462 5.687908 1.186275 0.857992 4.331373 The workers in Office 1 liked Plan 1 but not Plan The 3. 3. In Office 4 it was pretty much the opposite. Offices 2 and 3 had no strong preferences. Chi-Square Tests Chi-Square 39 Note on “Expected Frequency” PHStat said the expected frequency PHStat assumption was met. assumption Essentially, this test is based on an Essentially, approximation that is met as long as most of the Eij values are at least 5. of If you had a “sparse” table, you might If have to combine some categories to meet this assumption. this Chi-Square Tests Chi-Square 40 On an exam Obviously, this is a computer procedure You would not have to do one of these on You an exam. an However, I could give you the PHStat However, output and ask you to figure it out. output The “extra information” analysis, too. Chi-Square Tests Chi-Square 41 Goodness-of-Fit Tests These are tests to see if the observed These data follow an expected pattern. data For example, suppose we have several For categories and want to see if they are all equal in size. equal Another example: do people choose the Another same type of chocolate that they did before? before? Chi-Square Tests Chi-Square 42 First Example Are technical support Are calls equal across all days of the week? We sample data for 10 days for each day of week. day Day of Week Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total Chi-Square Tests Chi-Square No. Calls 290 250 238 257 265 230 192 1722 43 Logic of the test If calls are uniformly distributed, the 1722 If are calls would be expected to be equally divided across the 7 days. divided This would mean that each day would This have 1722 ÷ 7 = 246 support calls. Does the data agree with this? Chi-Square Tests Chi-Square 44 Observed versus Expected Day of Week Monday Tuesday Wednesday Thursday Friday Saturday Sunday Observed 290 250 238 257 265 230 192 Chi-Square Tests Chi-Square Expected 246 246 246 246 246 246 246 45 The Chi-Square Test Ho: The distribution of calls is uniform across The days of the week days H1: The distribution of calls is not uniform The across days of the week across The test statistic is: 2 K (Oi − Ei ) 2 χ =∑ (where d.f. = K − 1) Ei i=1 where: K = number of categories Oi = observed frequency for category i Ei = expected frequency for category i Chi-Square Tests Chi-Square 46 The Critical Value (at α = .05) The Critical Values of Chi-Square Upper-Tail Areas ( α ) df 1 2 3 4 5 6 7 8 9 10 0.10 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 0.05 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 0.025 5.024 7.378 9.348 11.143 12.833 14.449 16.013 17.535 19.023 20.483 Chi-Square Tests Chi-Square 0.01 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 0.005 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 47 Computations Day of Week Monday Tuesday Wednesday Thursday Friday Saturday Sunday Observed 290 250 238 257 265 230 192 Expected 246 246 246 246 246 246 246 Chi-Square Tests Chi-Square O-E 2 (O - E) /E 48 Completed Table Day of Week Monday Tuesday Wednesday Thursday Friday Saturday Sunday Observed 290 250 238 257 265 230 192 Expected 246 246 246 246 246 246 246 Chi-Square Tests Chi-Square O-E 44 4 -8 11 19 -16 -54 (O - E)2/E 7.8699 0.0650 0.2602 0.4919 1.4675 1.0407 11.8537 23.0488 49 Comments We rejected H0 so end up concluding that the assumption of uniformity across days is not true. days 2. If you examine the results more closely, If however, you see that wasn’t a bad assumption except on Sunday and Monday. Monday. 3. You can often get “extra information” by You examining the (Oi – Ei )2/Ei column for (O large numbers. large 1. Chi-Square Tests Chi-Square 50 On an exam This would have been too large of a This problem for me to expect you to do it by hand on an exam. hand A smaller problem (say K = 4) would be smaller “fair game”. “fair Don’t be surprised if I ask you to do that, Don’t assuming I gave you the pattern to test for. for. Chi-Square Tests Chi-Square 51 Another example Historical data suggests customer Historical preference for chocolate bars are: Mr. Goodbar (30%), Hershey’s Milk Chocolate (50%), Special Dark (15%), Krackel (5%). (50%), In a marketing research lab, a survey of In 200 students showed 50 selected Mr. Goodbar, 93 Milk Chocolate, 45 Special Dark and 12 Krackel. Dark Are student preferences different? Chi-Square Tests Chi-Square 52 Is local preference different? Hypotheses Test statistic Decision rule Results Chi-Square Tests Chi-Square 53 ...
View Full Document

Ask a homework question - tutors are online