This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Homework 3 for STA 5166 (Assigned, Oct. 8)
Statistics in Applications I Due: Oct. 17, 2007 (Wednesday) s 1: BHH Ch.2; Problems 10, 12, 13(a,b,c,d); Pages 62—65. (40) 2: BHH Ch.3; Problem 2 (Pages 124125). Submit both your summary results and R/Splus
program for the problem. (20) 3: BHH Ch.3; Problems 4, 7, and 13 (Pages 125—128). For each of the three problems,
perform a t—test on the difference of the two means and perform a test based on a ran
domization distribution (use R/Splus to generate 10000 samples and plot the histogram
of the differences). Submit both your summary results and R/Splus program for each
of the problems. (40) \o) Q (\ X 7f '3.“ 3 Q L1 ‘3‘
 "M ~41 0.1»0903 1 33'7"» ﬁSﬁNﬁk 3m: {‘mcﬁw‘g ‘QAW 0.1.053 on; M g gyﬂ % Q'Wam w mama Mr: w» w 5”? 0‘ “(if Mr:— ;2.'\ 1“ o? oaswwsﬁ) \m W33 QM. ”le {‘mmﬁga AA 1 ‘92." dxg\‘€W\ag¥§~3’k E . 3 (5))
x“ tha’m) a),?ka<Q%§)< VLZK )
M .'§ ﬂ.9%%wm»f \3) «7 \L‘KVU‘ZO) « \{ka w? 2:33am» m E m f 3‘ Q\ n
\ k Q’ X (f 3313/13} ? QWDHQJ v ( E < ‘\
(Lia
\W a): 9 31M
\ c9 0
E 2% Eq/%”
1 5 WW
3 kg 0' [got ( 3?;36‘4
3 3 a? w
i: 3 t
a; if [ﬁg/(ﬁg '''' ' W ?3 :3 0%
21¢:
) "3” "T is w “x “m; :3;
WM“) “My '?
f: Q 3‘32“ M L mm}? Y ‘7}, mm \ICF‘ICHUL.
Ami} W»! k, WQQWHM v0 \ X
gmf \Ew {X 9&2 a : webm J {a x“ f \ «\5‘ E (’"X
\W L». E J 45>
. k a, 2x5”
Tam—«ow WM W “ W
WM «MW {3 O M Chapter 3.2 (:0 :1) Summary:
Will try to test the hypothesis to see if there exists a signiﬁcant difference between the mean values of levels of asbestos ﬁber in the air of the industrial plant With and Without
S—l42 chemical. From the comparative trail in the plant, the four consecutive readings
had a mean difference of —3.5. The null hypothesis is that with or Without Sl42, the
asbestos levels will not change, the alternative is that with S142, the level will decrease
since the mean difference is negative. To test this, used as a reference the past
observations of asbestos levels Without S~142. From the dataset, obtained a probability
that 1/109 (=0.009l743119) that there exists a mean difference less that the comparative
trail. Since this probability is less that 5%, we reject the null hypothesis and accept the
salesman claim that S—142 is beneﬁcial to reduce the level of asbestos levels in the air of
the industrial plant. WWWMMWWWWWMWMWMMW u . CODE MKMquWmWWthMiW « WWWWWWmWWMWmW data=scan("C:/Documents and Settings/Jaime/Desktop/FALL07/STA5166/BHH2—
Data/datahw3.dat") data
nl=0
Meanlwout = mean(c(8,6))
MeanZwith = mean(c(3,4)) diff_means = Mean2with—Meanlwout y = c(rep(NA, (109))) X = c(rep(NA, (109))) for(i in l:lO9){ y[i] (data[i]+data[i+l])/2}
f0r(j in 11109){ X[j] Y[j+2]  Y[j];
if(x[j]<= diffﬁmeans) n1=nl+l} x sort(x) nl diff_means nl/lO9 mmmmwwmmmwmwwmmmrwwwwmwww ,, _ .. .. , WWWWVAWWLW/memﬂ“WWWWLWWWWM WWﬂsmﬂwgvmmwﬂxammwmimawmemwwlwﬁw“mewwwm, _ anm mm > data=scan("C:/Documents and Settings/Jaime/Desktop/FALL07/STAS166/BHH2—
Data/datahw3.dat")
Read 112 items >data
[1]9108988876910119101111111110111213121312
[26]14151412131312131313131310898677656564
[51]544245456556567888791091098
[76]9877877788887656567665665
[101]434455656765
>n1=0 > Meanlwout = mean(c(8,6))
> MeanZWith = mean(c(3,4))
> diff__means = Mean2with—Mean1wout
> y = C(rep(NA, (109)»
> x = c(rep(NA, (109)))
> for(i in 1:109){ y[i] = (data[i]+data[i+1])/2}
> f0r0 in 11109){ KB] = YU+21 — y[i];
+ if(x[j]<= diff_means) n1=—n1+1}
Error in if (x[j] <= diff_means) n1 = n1 + 1 :
missing value where TRUE/FALSE needed
>’X
[1] —1.0 —0.5 —0.5 —0.5 —0.5 —1.5 0.0 3.0 3.0 0.5 1.0 0.5 1.5 0.5 0.0
[16] —0.5 05 1.0 2.0 1.0 0.0 0.0 0.5 2.0 1.5 —1.5 2.0 0.0 0.0 0.5
[31] 0.5 0.5 0.0 0.0 —1.5 4.0 3.0 —0.5 —1.5 —2.0 0.0 0.0 —1.5 ~1.0 0.0
[46] 0.0 —0.5 1.0 0.5 —0.5 1.5 —1.0 1.5 1.5 0.0 1.0 1.0 0.5 0.0 0.5 [61] 0.0 1.0 2.0 1.5 0.5 —0.5 0.0 2.0 1.5 0.0 0.0 —1.0 1.0 0.0 —1.0
[76] —1.5 0.0 0.5 ~05 —O.5 0.5 1.0 0.5 0.0 0.5 —1.5 2.01.0 0.0 0.0
[91] 1.0 1.00.5 —1.0—0.5 0.5 0.0 —1.5 —2.0 ~1.0 0.5 1.0 1.0 1.0 0.5
[106] 0.0 1.0 NA NA >sort(x) [1] —4.0 —3.0 ~2.0 —2.0 2.0 2.0 —1.5 1.5 1.5 1.5 1.5 —1.5 1.5 1.5 1.5
[16] —1.0 1.0 —1.0 —1.0 ~1.0 ~1.0 ~1.0 —1.0 —1.0 —1.0 —1.0 0.5 0.5 0.5 0.5
[31] —0.5 0.5 —0.5 0.5 0.5 0.5 —0.5 0.5 0.5 0.5 0.5 —0.5 0.5 0.5 0.0
[46] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[61] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 0.5 0.5
[76] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0
[91] 1.0 1.0 1.0 1.0 1.0 1.5 1.5 1.5 1.5 1.5 1.5 2.0 2.0 2.0 2.0
[106] 3.0 3.0 >n1 [1] 1 >diff_means [1] 3.5 >n1/109 [1] 0009174312 Chapter 3.4 (:0 Ill) Assumptions: Ratings are both approximately normal distributed. Two samples, A and B, are
independent. Ratings in each brand are i.i.d. Summary:
To test the hypothesis that 77A = 773 , against the 77A 7: 773 . I used a t—test to check if the difference in means is not equal to zero. The pvalue obtained 0.3316. Therefore there is
not sufficient evidence to reject the null hypothesis. The assumptions were needed to
conduct test. Using a randomization distribution, I also tested the above hypothesis. Here, I did not
make assumptions about the distributions of the ratings. The mean of brand A: (3.875)
and brand B: (5.285714), to obtain a difference of 1.4107. There exist 6435 possible permutations of 8 ratings of brand A and 7 ratings of brand B.
Assuming that the null hypothesis, then there exist no difference in the ratings of brand A
and brand B. Can arrange and for each calculate the differences that are less than 1.4107.
Count the number of occurrences and this will lead to a calculation of the p—value. The p—value obtained after a large number of observations should be approximately equal
to the p—value obtained from the ttest above. I obtained the p—value: 0.3551. This also
leads to the conclusion that one cannot reject the null hypothesis. unwavwaFWAWMWWWWWWMKWWWWWWWWW . v , CODE y = t.test(brandA, brandB) y nl=0 hl=0 yl= c(2,4,2,l,9,9,2,2,8,3,5,3,7,7,4)
Cl=c(rep("A“, 8), rep("B", 7))
dl = c(rep(0,10000)) diff: 5.285714—3.875 for(i in 1:10000){
C2=sample(cl)i xl=yl[c2=="A"]; X2=yl[C2=="B" ; ml = mean(x;); m2 = mean(x2); dl[i] : m2—m1; hl=c(h1,dl[il); if(abs(dl[i]) >= 1.4107)nl=nl+l
} n1 hist(hl, main=“Randomization Distribution“)
pvalue= nl/lOOOO pvalue »»»»»»»»»»»»»»»»»»»»»»» v. , , I OUTPUT r WWWWWWMWWWIWWWIWNWWWJWMO > brandA = c(2,4,2,1,9,9,2,2)
> brandB = c(8,3,5,3,7,7,4)
> > y = t.test(brandA, brandB)
> y Welch Two Sample t—test data: brandA and brandB t = ~1.0122, df= 11.923, pValue = 0.3316 alternative hypothesis: true difference in means is not equal to 0
95 percent conﬁdence interval: 4.449587 1.628159 sample estimates: mean of x mean of y 3.875000 5.285714 >n1=0 > h1=0 > y1= c(2,4,2,1,9,9,2,2,8,3,5,3,7,7,4)
> cl=c(rep("A", 8), rep("B", 7)) > d1 = c(rep(0,10000)) > diff: 5.285714—3.875 > for(i in 1:10000){ + 02=sample(cl); + x1=y1[02=="A"]; + x2=y1[c2=="B"]; + m1= mean(x1); + m2 = mean(x2); + d1[i] = m2—m1; + h1=c(h1,d1[i]); + if(abs(d1[i]) >= 1.4107)n1=n1+1
+ } > n1 [1] 3551 > hist(h1, main="Randomizati0n Distribution")
> pvalue= n1/ 10000 > pvalue [1] 0.3551 Randomization Distribution 1500 1000 Frequency 500 Chapter 3.7 (3.0 ’b) Assumptions:
Results are both approximately normal distributed. Two samples, designs A and B, are
independent. Results in each design are i.i.d. Summary: ~ Will try to test the hypothesis to see if there exists a signiﬁcant difference between the
mean values for the power attainable for the two designs. The null hypothesis assumes
there is no difference in the mean values. I used a ttest to check if the difference in
means is not equal to zero. The pvalue obtained 0.4343. Therefore there is not
sufﬁcient evidence to reject the null hypothesis. The assumptions were needed to
conduct test. Using a randomization distribution, I also tested the above hypothesis. Here, I did not
make assumptions about the distributions of the ratings. The mean of design A: (1.55)
and brand B: (1.75), to obtain a difference of 0.2 The p—value obtained after a large number of observations should be approximately equal
to the p—value obtained from the ttest above. I obtained the pvalue: 0.4454. This also
leads to the conclusion that one cannot reject the null hypothesis. W’Ww »c,_.MWW ,.... u. NLWKW‘ «« designA = c(l.8, 1.9, 1.1, 1.4)
designB = c(l 9, 2.1, 1.5, 1.5)
y = t.test(designA , designB) Y n1=0 hl=NULL y1= c(l.8, 1.9, 1.1, 1.
c1=c(rep("A", 4), rep(
diff=1.75—1.55 d1 = rep(O, 10000)
for(i in 1:10000){
c2=sample(c1);
x1=y1[c2==“A"]; x2=y1[c2::"B"] m1 : mean(x1); m2 = mean(x2); d1[i] = m2—m1; hl=c(h1,d1[i]); if(abs(d1[i]) >= diff)n1=n1+1 } n1 hist(h1, main=“Randomization Distribution")
pvalue= nl/lOOOO , 1.9, 2.1, 1.5, 1.5) pvalue
OUTPUT
> designA : c(l.8, 1.9, 1.1, 1.4)
> designB = c(1.9, 2.1, 1.5, 1.5)
>
> y = t.test(designA , designB)
> Y Welch Two Sample t—test data: designA and designB
t = —0.8402, df = 5.756, p—value = 0.4343
alternative hypothesis: true difference in means is not equal to O
95 percent confidence interval:
—0.7885191 0.3885191
sample estimates:
mean of x mean of y
1.55 1.75 n1=0 hl=NULL y1= c(l.8, 1.9, 1.1, 1.
c1=c(rep("A“, 4), rep(
diff=1.75—1.55 d1 = rep(O, 10000)
for(i in 1:10000){ , 1.9, 2.1, 1.5, 1.5) VVVVVVV + c2=sample(cl); + Xl=yl[C2=="A"]; x2=yl[c2=="B“]
+ ml 2 mean(xl); m2 = mean(x2);
+ dl[i] = mZ—ml; + hl=c(hl,dl[i]); + if(abs(dl[i]) >= diff)nl=nl+l
+ } > n1 [1] 4454 > hist(hl, main="Randomization Distribution“)
> pvalue= nl/lOOOO > pvalue [1] 0.4454 Randomization Distribution 1500 Frequency
1000 500 ~06 43.4 43.2 {3.0 8.2 8.4 {3.6 m Chapter 3.13 (:0 +3) Assumptions:
Results of production from each diet are both approximately normal distributed. Two
samples, designs A and B, are independent. Results in each diet are i.i.d. Summary: Will try to test the hypothesis to see if there exists a signiﬁcant difference between the
mean values for the power attainable for the two designs. The null hypothesis assumes
there is no difference in the mean values. I used a ttest to check if the difference in
means is not equal to zero. The pValue obtained 0.07842. Therefore there is not
sufﬁcient evidence to reject the null hypothesis. The assumptions were needed to
conduct test. Using a randomization distribution, 1 also tested the above hypothesis. Here, I did not
make assumptions about the distributions of the ratings. The mean of diet A: (166.5)
and brand B: (156.6667), to obtain a absolute value of the difference of 9.83 The p—value obtained after a large number of observations should be approximately equal
to the pValue obtained from the t—test above. I obtained the p—Value: 0.0913. This also
leads to the conclusion that one cannot reject the null hypothesis. A 95% conﬁdence interval for the mean difference: [4.8649, 14.79508] Here, the 95% conﬁdence interval for the difference in mean hen production between diet
A and diet B numbers above. Thus, not only do we estimate the difference to be 9.83
mg/dl, but we are 95% conﬁdent it is no less than lower bound or greater than upper
bound. m»wmw»wwwwmwmmwwwwmx :  ‘memesawﬂvwwwm 992E WWWNWWWWWWWWWWK»WWMWMWWWWWWWWWV dietA
dietB c(l66,l74,150,166,165,l78)
C(158,159,l42,163,161,157) ll 1! y = t.test(dietA , dietB) Y nl=0 hl =NULL yl= C(l66,174,150,166,l65,l78, 158,159,142,l63,l6l,157)
cl=c(rep("A", 6), rep("B", 6))
diff=l66.5 — 156.6667 d1 = rep(0,10000) for(i in l:lOOOO){
C2=sample(cl); xl=yl[c2==“A“];
X2=yl[c2=="B"]; ml 2 mean(xl); m2 = mean(x2); dl[i] = m2—ml; hl=c<hl,dl[i]);
if<abs(dl[i])>=9.83)n1=n1+l }
nl hist(hl, main="Randomization Distribution“)
pvalue= nl/10000
pvalue > dietA = c(166,174,150,166,165,l78)
> dietB = c(l58,159,142,163,161,157)
> > y = t.test(dietA , dietB) > y Welch Two Sample t—test data: dietA and dietB
1:: 1.9735, df= 9.436, pValue = 0.07842
alternative hypothesis: true difference in means is not equal to 0
95 percent conﬁdence interval:
1.359600 21.026267
sample estimates:
mean of X mean of y 166.5000 156.6667 > nl=0 > h1=NULL > y1= c(166,l74,150,166,165,178,158,159,142,163,161,157)
> cl=c(rep("A", 6), rep("B", 6)) > diff=166.5 ~ 156.6667
> d1 = rep(0,10000) > for(i in 1:10000){ + 02=sample(cl); + x1=y1[02=="A"]; + X2=y1[02=="B"]; + m1 = mean(x1); + m2 = mean(x2); + d1[i] = m2—m1; + h1=c(h1,d1[i]); + if(abs(d1[i])>=9.83)n1=n1+1 +}
>n1
[1] 913 > hist(h1, main="Randomization Distribution") > pvalue= n1/ 10000
> pvalue
[1] 0.0913 1500 1000 Frequency 500 Randomizatien Distribution ...
View
Full Document
 Fall '11
 staff
 Null hypothesis, Statistical hypothesis testing, randomization distribution

Click to edit the document details