hw10 - STAT 350 – Spring 2009 Homework #10 Solution...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STAT 350 – Spring 2009 Homework #10 Solution covers through Lecture O You can get p-values that are more precise than what is possible using the Chi-Squared Table in the Appendix of your textbook in Excel using the function “ChiDist”. For example, if your chi-square statistic is 30.54 and the degrees of freedom is 9, in Excel type =ChiDist(30.54,9) and excel will return the p-value. Text Exercises Chapter 8 problems #42, 44, 46 For each of the text problems, your work MUST show (i) the table of expected counts, (ii) the value of the Chi-Square statistic, (iii) the degrees of freedom for the test, (iv) the critical value of your test statistic for α = 0.05, and (v) the p-value of the test, as well as your conclusion. 8.42 Client From: Expected proportion: Business 0.4 Engineering Social Science Agriculture TOTAL 0.3 0.2 0.1 1 Expected Counts: 48 36 24 12 120 Observed Counts: 52 38 21 9 120 0.375 0.75 1.569444 2 (O-E) /E: 0.333333333 0.111111111 χ2 = 1.569444, df = 4-1 = 3, critical value = 7.81, p-value = 0.666 Therefore we would not reject the null hypothesis that the percentages for the 4 catageories are 40%, 30%, 20% and 10%, respectively. 8.44 πi location expected observed (i) frequencies counts 1 0.033333 4 2 0.066667 15 3 0.1 23 4 0.133333 25 5 0.166667 38 6 0.166667 31 7 0.133333 32 8 0.1 14 9 0.066667 10 10 0.033333 8 Sum 1 200 expected counts 6.666667 13.33333 20 26.66667 33.33333 33.33333 26.66667 20 13.33333 6.666667 200 (O-E)2/E 1.066667 0.208333 0.45 0.104167 0.653333 0.163333 1.066667 1.8 0.833333 0.266667 6.6125 χ2 = 6.6125, df = 10-1 = 9, critical value = 16.91, p-value = 0.6774. Therefore we would not reject the null hypothesis that the true (long-run) proportion of requests for location i is (5.5 - |i – 5.5|)/30 Homework #10 – Solution Page 1 of 1 8.46 Observed Counts Configuration 1 2 3 total 1 20 4 10 34 Failure Mode 2 3 44 17 17 7 31 14 92 38 4 9 12 5 26 total 90 40 60 190 4 12.31579 5.473684 8.210526 26 total 90 40 60 190 Expected Counts Failure Mode Configuration 1 2 3 1 16.10526 43.57895 18 2 7.157895 19.36842 8 3 10.73684 29.05263 12 total 34 92 38 (O-E)2/E Failure Mode Configuration 1 2 3 4 total 1 0.941864 0.004068 0.055556 0.892713 2 1.393189 0.289617 0.125 7.781377 3 0.050568 0.13053 0.333333 1.255398 total 13.25321 χ2 = 13.25321, df = (4-1)(3-1) = 6, critical value = 12.59, p-value = 0.039187 Using α = 0.05, we would reject the null hypothesis of no association between Failure Model and Configuration. Homework #10 – Solution Page 2 of 2 Then answer the following Problems 1. Here are the numbers of flights on time and delayed for 2 airlines at 5 airports. L.A. Phoenix San Diego San Francisco Seattle All Airports On time 497 221 212 503 1841 3274 Alaska Airlines Delayed 62 12 20 102 305 501 Total 559 233 232 605 2146 3775 On Time 694 4840 383 320 201 6438 America West Delayed 117 415 65 129 61 787 Total 811 5255 448 449 262 7225 a. Find the % of delayed flights for Alaska Airlines at each of the 5 airports, and then do the same for America West. Present your answers in a table. % Delayed Flights Alaska AmWest L.A. 11.09% 14.43% Phoenix 5.15% 7.90% San Diego 8.62% 14.51% San Francisco 16.86% 28.73% 14.21% 23.28% Seattle b. What % of all Alaska Airlines flights were delayed? What % of all America West flights were delayed? These are the numbers usually reported. 501 / 3775 = 13.27% of all Alaska Airlines flights were delayed 787 / 7225 = 10.89% of all America West Flights were delayed. Homework #10 – Solution Page 3 of 3 c. America West does worse at every one of the 5 airports, yet does better overall. That sounds impossible. Explain how this can happen. This is an example of Simpson’s Paradox. Notice that America West has its lowest delay rate in Phoenix, where the vast majority (nearly 73%) of it’s flights are – so the low delay rate here is weighted heavily in the overall delay rate of AmWest. In contrast, Alaska Airlines has one of its higher delay rates in Seattle, where the majority (nearly 57%) of its flights are, giving this relatively high delay rate a heavy weight in the overall delay rate for Alaska. The following table shows what percent of each airline's flights were at each airport. Simpson’s Paradox can only occur when the marginal probabilities differ (that is, the proportion of flights at each airport are not the same for the two airlines). L.A. Phoenix Alaska 14.81% 6.17% AmWest 11.22% 72.73% San Diego 6.15% 6.20% 6.21% San Francisco 16.03% 56.85% 3.63% Seattle 100.00% 100.00% Total Homework #10 – Solution Page 4 of 4 2. The marketing department at a technical college wanted to conduct an analysis of its current enrollment. The following table gives counts of the college's students in 4 different classifications for program and 3 classifications for age group. Program 2-year full-time 2-year part-time 4-year full-time 4-year part-time 42 31 11 35 76 28 108 127 39 22 69 34 Age 21 and under 22 to 34 35 and over a. Assuming there is no relationship between program and age group, what would you expect the cell counts to be? (Give your answer in the form of a table). Program 2-year part-time 4-year full-time 4-year part-time TOTAL 21 and under 27.9550 46.2588 91.1865 41.5997 207 22 to 34 40.9196 67.7122 133.4759 60.8923 303 35 and over 15.1254 25.0289 49.3376 22.5080 112 TOTAL Age 2-year full-time 84 139 274 125 622 b. If you are testing the null hypothesis that there is no relationship between program and age group, what would be the value of the chi-squared statistic? 36.4554 Here is a table of the value of (O-E)2/E for each cell Program 2-year part-time 4-year full-time 4-year part-time 21 and under 7.0564 2.7403 3.1002 9.2344 22 to 34 2.4047 1.0144 0.3142 1.0795 35 and over Age 2-year full-time TOTAL 1.1252 0.3527 2.1660 5.8675 36.4554 TOTAL c. Give the degrees of freedom for your test statistic. df = (3-1)(4-1) = 6 d. Give the critical value (α=0.05) for your test statistic. 12.59 (Appendix table VII, page 571) e. Give the p-value for your test statistic 2.25×10-6 In Excel: =ChiDist(36.4554, 6) f. Based on your test, what would you conclude about the relationship between SES and smoking status? There is a statistically significant relationship between the age of a student and the type of program he/she is enrolled in. In other words, age and program are NOT independent. Homework #10 – Solution Page 5 of 5 3. In Lab #7 we look at a number of different ways to assess whether a sample may have come from a normal distribution. Probability or QQ-plots are one approach. We also looked at hypothesis tests such as Shapiro-Wilk. Another approach we used was to look at the sample histogram with a normal distribution curve with μ = x and σ = s. Instead of just qualitatively looking at how close the histogram is to the curve, we can assess this quantitatively (this is how some of the hypothesis tests work). A sample of 100 observations was taken. You wish to assess whether these observations may have come from a Normal distribution. The sample had x = 100 and s = 10. The number of observations in each bin are given in the following table. bin <80 observed frequency 80‐90 90‐95 95‐100 100‐105 105‐110 110‐120 >120 TOTAL 5 11 13 22 18 16 9 6 100 a. The null hypothesis is that the sample did come from a normal distribution with µ = 100 and σ = 10. This null hypothesis can be restated in terms of the true proportions of each category, the πi's. Give the values of π1 though π8 (to 4 decimal places). Hint: π1 = P(X < 80) for X ~ Normal(µ = 100, σ = 10) and 8 ∑π i =1 i = 1. ⎛ X − 100 80 − 100 ⎞ < ⎟ = P ( Z < −2.00 ) = Φ ( −2.00 ) = 0.0228 10 ⎠ ⎝ 10 ⎛ 80 − 100 X − 100 90 − 100 ⎞ < < π 2 = P ( 80 < X < 90 ) = P ⎜ ⎟ = P ( −2.00 < Z < −1.00 ) 10 10 ⎠ ⎝ 10 = Φ ( −1.00 ) − Φ ( −2.00 ) = 0.1587 − 0.0228 = 0.1359 π 1 = P ( X < 80 ) = P ⎜ ⎛ 90 − 100 X − 100 95 − 100 ⎞ < < ⎟ = P ( −1.00 < Z < −0.50 ) 10 10 ⎠ ⎝ 10 = Φ ( −0.50 ) − Φ ( −1.00 ) = 0.3085 − 0.1587 = 0.1498 π 3 = P ( 90 < X < 95 ) = P ⎜ ⎛ 95 − 100 X − 100 100 − 100 ⎞ < < ⎟ = P ( −0.50 < Z < 0 ) 10 10 ⎝ 10 ⎠ = Φ ( 0 ) − Φ ( −0.50 ) = 0.50 − 0.3085 = 0.1915 By symmetry, π5 = π4 = 0.1915 π6 = π3 = 0.1498 π7 = π2 = 0.1359 π8 = π1 = 0.0228 π 4 = P ( 95 < X < 100 ) = P ⎜ b. Give the expected counts for each bin (to 2 decimal places). bin expected frequency Homework #10 – Solution <80 π1×100= 2.28 80‐90 π2×100= 13.59 90‐95 π3×100= 14.98 95‐100 π4×100= 19.15 100‐105 π5×100= 19.15 105‐110 π6×100= 14.98 110‐120 π7×100= 13.59 >120 TOTAL π8×100= 2.28 100 Page 6 of 6 c. Give the value of the chi-square statistic for this analysis. χ2 = 12.18263 Bin <80 80‐90 90‐95 95‐100 100‐105 105‐110 110‐120 >120 TOTAL Observed 5 11 13 22 18 16 9 6 100 Expected 2.28 13.59 14.98 19.15 19.15 14.98 13.59 2.28 100 3.244912 0.493606 0.261709 0.424151 0.06906 0.069453 1.550265 6.069474 12.18263 (O − E ) E 2 d. Give the degrees of freedom for this test statistic. df = (# of bins – 1) = 8 – 1 = 7 e. Give the critical value (α = 0.05) for this test statistic. 14.06 (Appendix Table VII. page 571) f. Give the p-value for this test statistic. 0.0947 in Excel =ChiDist(12.18263,7) g. Based on this analysis, would you conclude that the sample came from a Normal distribution or not? The null hypothesis is that the sample DID come from a normal distribution. Based on our test statistic and critical value (or the p-value), we do NOT REJECT the null hypothesis. We would conclude that the sample DID come from a Normal distribution. 4. Effect of sample size. The purpose of this next problem is to see the effects of differing sample sizes in a chi-square test. Assume that the observed proportions for a 2-way contingency table are as given below. Column1 Column2 Row1 0.1 0.4 Row2 0.2 0.3 Complete the following table assuming (a) that the total number of observations is 10 and (b) that the total number of observation is 100. Be sure to show your work. Total Observations Test Statistic (χ ) degrees of freedom critical value (α = 0.05) p-value Are the row and column variables independent? 2 Homework #10 – Solution 10 0.47619 1 3.84 0.490153 Yes. 100 4.761905 1 3.84 0.029096 No. Page 7 of 7 For n = 10 O counts c1 r1 r2 TOTAL c2 1 2 3 E counts c1 r1 r2 TOTAL c2 1.5 1.5 3 (o‐e)^2/e r1 r2 TOTAL TOTAL 4 3 7 5 5 10 TOTAL 3.5 3.5 7 5 5 10 c1 c2 TOTAL 0.166667 0.071429 0.238095 0.166667 0.071429 0.238095 0.333333 0.142857 0.47619chi‐sq 0.490153p‐value For n = 100 O counts c1 r1 r2 TOTAL c2 10 20 30 E counts c1 r1 r2 TOTAL c2 15 15 30 TOTAL 40 30 70 50 50 100 TOTAL 35 35 70 50 50 100 (o‐e)^2/e c1 c2 TOTAL r1 1.666667 0.714286 2.380952 r2 1.666667 0.714286 2.380952 TOTAL 3.333333 1.428571 4.761905chi‐sq 0.029096p‐value Homework #10 – Solution Page 8 of 8 ...
View Full Document

This note was uploaded on 02/16/2010 for the course STAT 350 taught by Professor Staff during the Spring '08 term at Purdue.

Ask a homework question - tutors are online