This preview shows page 1. Sign up to view the full content.
Unformatted text preview: STAT 350 – Spring 2009 Homework #10 Solution
covers through Lecture O You can get pvalues that are more precise than what is possible using the ChiSquared Table in the
Appendix of your textbook in Excel using the function “ChiDist”. For example, if your chisquare
statistic is 30.54 and the degrees of freedom is 9, in Excel type =ChiDist(30.54,9) and excel will return
the pvalue.
Text Exercises
Chapter 8 problems #42, 44, 46
For each of the text problems, your work MUST show (i) the table of expected counts, (ii) the value of
the ChiSquare statistic, (iii) the degrees of freedom for the test, (iv) the critical value of your test
statistic for α = 0.05, and (v) the pvalue of the test, as well as your conclusion.
8.42
Client From:
Expected proportion: Business
0.4 Engineering Social Science Agriculture TOTAL
0.3 0.2 0.1 1 Expected Counts: 48 36 24 12 120 Observed Counts: 52 38 21 9 120 0.375 0.75 1.569444 2 (OE) /E: 0.333333333 0.111111111 χ2 = 1.569444, df = 41 = 3, critical value = 7.81, pvalue = 0.666
Therefore we would not reject the null hypothesis that the percentages for the 4 catageories are
40%, 30%, 20% and 10%, respectively.
8.44
πi
location
expected observed
(i)
frequencies counts
1
0.033333
4
2
0.066667
15
3
0.1
23
4
0.133333
25
5
0.166667
38
6
0.166667
31
7
0.133333
32
8
0.1
14
9
0.066667
10
10
0.033333
8
Sum
1
200 expected
counts
6.666667
13.33333
20
26.66667
33.33333
33.33333
26.66667
20
13.33333
6.666667
200 (OE)2/E
1.066667
0.208333
0.45
0.104167
0.653333
0.163333
1.066667
1.8
0.833333
0.266667
6.6125 χ2 = 6.6125, df = 101 = 9, critical value = 16.91, pvalue = 0.6774.
Therefore we would not reject the null hypothesis that the true (longrun) proportion of requests for
location i is (5.5  i – 5.5)/30 Homework #10 – Solution Page 1 of 1 8.46
Observed Counts
Configuration
1
2
3
total 1
20
4
10
34 Failure Mode
2
3
44
17
17
7
31
14
92
38 4
9
12
5
26 total
90
40
60
190 4
12.31579
5.473684
8.210526
26 total
90
40
60
190 Expected Counts
Failure Mode
Configuration
1
2
3
1
16.10526 43.57895
18
2
7.157895 19.36842
8
3
10.73684 29.05263
12
total
34
92
38 (OE)2/E
Failure Mode
Configuration
1
2
3
4
total
1
0.941864 0.004068 0.055556 0.892713
2
1.393189 0.289617 0.125 7.781377
3
0.050568 0.13053 0.333333 1.255398
total
13.25321 χ2 = 13.25321, df = (41)(31) = 6, critical value = 12.59, pvalue = 0.039187
Using α = 0.05, we would reject the null hypothesis of no association between Failure Model and
Configuration. Homework #10 – Solution Page 2 of 2 Then answer the following Problems
1. Here are the numbers of flights on time and delayed for 2 airlines at 5 airports. L.A.
Phoenix
San Diego
San Francisco
Seattle
All Airports On time
497
221
212
503
1841
3274 Alaska Airlines
Delayed
62
12
20
102
305
501 Total
559
233
232
605
2146
3775 On Time
694
4840
383
320
201
6438 America West
Delayed
117
415
65
129
61
787 Total
811
5255
448
449
262
7225 a. Find the % of delayed flights for Alaska Airlines at each of the 5 airports, and then do the same
for America West. Present your answers in a table.
% Delayed Flights
Alaska AmWest L.A. 11.09% 14.43% Phoenix 5.15% 7.90% San Diego 8.62% 14.51% San Francisco 16.86% 28.73% 14.21% 23.28% Seattle b. What % of all Alaska Airlines flights were delayed? What % of all America West flights were
delayed? These are the numbers usually reported.
501 / 3775 = 13.27% of all Alaska Airlines flights were delayed
787 / 7225 = 10.89% of all America West Flights were delayed. Homework #10 – Solution Page 3 of 3 c. America West does worse at every one of the 5 airports, yet does better overall. That sounds
impossible. Explain how this can happen.
This is an example of Simpson’s Paradox. Notice that America West has its lowest delay rate
in Phoenix, where the vast majority (nearly 73%) of it’s flights are – so the low delay rate here
is weighted heavily in the overall delay rate of AmWest. In contrast, Alaska Airlines has one
of its higher delay rates in Seattle, where the majority (nearly 57%) of its flights are, giving this
relatively high delay rate a heavy weight in the overall delay rate for Alaska.
The following table shows what percent of each airline's flights were at each airport.
Simpson’s Paradox can only occur when the marginal probabilities differ (that is, the
proportion of flights at each airport are not the same for the two airlines).
L.A.
Phoenix Alaska
14.81%
6.17% AmWest
11.22%
72.73% San Diego 6.15% 6.20% 6.21%
San Francisco 16.03%
56.85%
3.63%
Seattle
100.00% 100.00%
Total Homework #10 – Solution Page 4 of 4 2. The marketing department at a technical college wanted to conduct an analysis of its current
enrollment. The following table gives counts of the college's students in 4 different classifications
for program and 3 classifications for age group.
Program
2year
fulltime 2year
parttime 4year
fulltime 4year
parttime 42
31
11 35
76
28 108
127
39 22
69
34 Age 21 and under
22 to 34
35 and over a. Assuming there is no relationship between program and age group, what would you expect the
cell counts to be? (Give your answer in the form of a table).
Program
2year
parttime 4year
fulltime 4year
parttime TOTAL 21 and under 27.9550 46.2588 91.1865 41.5997 207 22 to 34 40.9196 67.7122 133.4759 60.8923 303 35 and over 15.1254 25.0289 49.3376 22.5080 112 TOTAL Age 2year
fulltime 84 139 274 125 622 b. If you are testing the null hypothesis that there is no relationship between program and age
group, what would be the value of the chisquared statistic? 36.4554
Here is a table of the value of (OE)2/E for each cell
Program
2year
parttime 4year
fulltime 4year
parttime 21 and under 7.0564 2.7403 3.1002 9.2344 22 to 34 2.4047 1.0144 0.3142 1.0795 35 and over Age 2year
fulltime TOTAL 1.1252 0.3527 2.1660 5.8675 36.4554 TOTAL c. Give the degrees of freedom for your test statistic. df = (31)(41) = 6
d. Give the critical value (α=0.05) for your test statistic. 12.59 (Appendix table VII, page 571)
e. Give the pvalue for your test statistic 2.25×106 In Excel: =ChiDist(36.4554, 6) f. Based on your test, what would you conclude about the relationship between SES and smoking
status? There is a statistically significant relationship between the age of a student and the type
of program he/she is enrolled in. In other words, age and program are NOT independent. Homework #10 – Solution Page 5 of 5 3. In Lab #7 we look at a number of different ways to assess whether a sample may have come from a
normal distribution. Probability or QQplots are one approach. We also looked at hypothesis tests
such as ShapiroWilk. Another approach we used was to look at the sample histogram with a
normal distribution curve with μ = x and σ = s. Instead of just qualitatively looking at how close
the histogram is to the curve, we can assess this quantitatively (this is how some of the hypothesis
tests work).
A sample of 100 observations was taken. You wish to assess whether these observations may have
come from a Normal distribution. The sample had x = 100 and s = 10. The number of
observations in each bin are given in the following table.
bin <80 observed frequency 80‐90 90‐95 95‐100 100‐105 105‐110 110‐120 >120 TOTAL 5 11 13 22 18 16 9 6 100 a. The null hypothesis is that the sample did come from a normal distribution with µ = 100 and σ
= 10. This null hypothesis can be restated in terms of the true proportions of each category, the
πi's. Give the values of π1 though π8 (to 4 decimal places).
Hint: π1 = P(X < 80) for X ~ Normal(µ = 100, σ = 10) and 8 ∑π
i =1 i = 1. ⎛ X − 100 80 − 100 ⎞
<
⎟ = P ( Z < −2.00 ) = Φ ( −2.00 ) = 0.0228
10 ⎠
⎝ 10
⎛ 80 − 100 X − 100 90 − 100 ⎞
<
<
π 2 = P ( 80 < X < 90 ) = P ⎜
⎟ = P ( −2.00 < Z < −1.00 )
10
10 ⎠
⎝ 10
= Φ ( −1.00 ) − Φ ( −2.00 ) = 0.1587 − 0.0228 = 0.1359 π 1 = P ( X < 80 ) = P ⎜ ⎛ 90 − 100 X − 100 95 − 100 ⎞
<
<
⎟ = P ( −1.00 < Z < −0.50 )
10
10 ⎠
⎝ 10
= Φ ( −0.50 ) − Φ ( −1.00 ) = 0.3085 − 0.1587 = 0.1498 π 3 = P ( 90 < X < 95 ) = P ⎜ ⎛ 95 − 100 X − 100 100 − 100 ⎞
<
<
⎟ = P ( −0.50 < Z < 0 )
10
10
⎝ 10
⎠
= Φ ( 0 ) − Φ ( −0.50 ) = 0.50 − 0.3085 = 0.1915
By symmetry,
π5 = π4 = 0.1915
π6 = π3 = 0.1498
π7 = π2 = 0.1359
π8 = π1 = 0.0228 π 4 = P ( 95 < X < 100 ) = P ⎜ b. Give the expected counts for each bin (to 2 decimal places).
bin expected frequency Homework #10 – Solution <80 π1×100= 2.28 80‐90 π2×100= 13.59 90‐95 π3×100= 14.98 95‐100 π4×100= 19.15 100‐105 π5×100= 19.15 105‐110 π6×100= 14.98 110‐120 π7×100= 13.59 >120 TOTAL
π8×100= 2.28 100 Page 6 of 6 c. Give the value of the chisquare statistic for this analysis.
χ2 = 12.18263
Bin <80 80‐90 90‐95 95‐100 100‐105 105‐110 110‐120 >120 TOTAL Observed 5 11 13 22 18 16 9 6 100 Expected 2.28 13.59 14.98 19.15 19.15 14.98 13.59 2.28 100 3.244912 0.493606 0.261709 0.424151 0.06906 0.069453 1.550265 6.069474 12.18263 (O − E )
E 2 d. Give the degrees of freedom for this test statistic.
df = (# of bins – 1) = 8 – 1 = 7
e. Give the critical value (α = 0.05) for this test statistic.
14.06 (Appendix Table VII. page 571)
f. Give the pvalue for this test statistic.
0.0947
in Excel =ChiDist(12.18263,7)
g. Based on this analysis, would you conclude that the sample came from a Normal distribution or
not?
The null hypothesis is that the sample DID come from a normal distribution. Based on our test
statistic and critical value (or the pvalue), we do NOT REJECT the null hypothesis. We would
conclude that the sample DID come from a Normal distribution.
4. Effect of sample size. The purpose of this next problem is to see the effects of differing sample
sizes in a chisquare test. Assume that the observed proportions for a 2way contingency table are
as given below.
Column1 Column2
Row1
0.1
0.4
Row2
0.2
0.3
Complete the following table assuming (a) that the total number of observations is 10 and (b) that
the total number of observation is 100. Be sure to show your work.
Total Observations
Test Statistic (χ )
degrees of freedom
critical value (α = 0.05)
pvalue
Are the row and column
variables independent?
2 Homework #10 – Solution 10
0.47619
1
3.84
0.490153
Yes. 100
4.761905
1
3.84
0.029096
No. Page 7 of 7 For n = 10
O counts c1 r1 r2 TOTAL c2 1 2 3 E counts c1 r1 r2 TOTAL c2 1.5 1.5 3 (o‐e)^2/e r1 r2 TOTAL TOTAL 4
3
7 5
5
10
TOTAL 3.5
3.5
7 5
5
10 c1 c2 TOTAL 0.166667 0.071429 0.238095
0.166667 0.071429 0.238095
0.333333 0.142857 0.47619chi‐sq 0.490153p‐value For n = 100
O counts c1 r1 r2 TOTAL c2 10 20 30 E counts c1 r1 r2 TOTAL c2 15 15 30 TOTAL 40
30
70 50
50
100
TOTAL 35
35
70 50
50
100 (o‐e)^2/e c1 c2 TOTAL r1 1.666667 0.714286 2.380952
r2 1.666667 0.714286 2.380952
TOTAL 3.333333 1.428571 4.761905chi‐sq 0.029096p‐value Homework #10 – Solution Page 8 of 8 ...
View
Full
Document
This note was uploaded on 02/16/2010 for the course STAT 350 taught by Professor Staff during the Spring '08 term at Purdue.
 Spring '08
 Staff
 PValues

Click to edit the document details