This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CHAPTER 10
INFERENCE FROM SMALL SAMPLES 10.2 Student’s t Distribution From the discussion of the sampling distribution of E in the preceding chapters, a few
points had been discovered: * When the original sampled population is normal, the statistics T and z$__’u — U/x/ﬁ are both normally distributed regardless of the sample size. * When the original sampled population is not normal, the statistics E—u E—u
a/x/ﬁ s/x/ﬁ all have approximately normal distributions as long as the sample size is large. and 2% E 2: * When the sample size n is small, the statistic E—u S/x/ﬁ does not have a normal distribution. In 1908, W. S. Gosset derived a complicated formula for the density function of E — u
3/ W for random samples of size n from a normal population, and he published his results under
the pen name “student”. t: Deﬁnition 1. The density function of the statistic s/x/ﬁ is call the Student ’5 t distribution With n —— 1 degrees offreedom (df t The Student’s t distribution has the following characteristics: 1. It is mound—shape and symmetric about t = 0, just like the standard normal 2. 1 2. It is more variable than 2, with “heavier tails”; that is, the t curve does not approach
the horizontal axis as quickly as z does. This is because the t statistic involves two random
quantities, E and 5, whereas the z statistic involves only the sample mean f. 3. The shape of the 75 distribution depends on the sample size n. As n increases, the
variability of 75 decrease because the estimate 3 of a is based on more and more information.
Eventually, when n is inﬁnitely large, the t and z distributions are identical. / Normal distribution Nisﬁibution ! \ 0 .
The table of probabilities for the standard normal 2 distribution is no longer useful in
calculating critical values or p—value for the t statistic. Instead, we will use Table 4 in Appendicc I. When we index a particular number of degrees of freedom, the table records ta,
a value of t that has tail area a to its right, as shown in the ﬁgure below. f '(f) Example 1. For a t distribution with 5 degrees of freedom, the value of t that has
area 0.05 to its right is found in row 5 in the column marked t0‘05. For this particular It
distribution, the area to the right of t = 2.015 is 005; only 5% of all values of the t statistic will exceed this value] Example 2. Suppose we have a sample of size n = 10 from a normal distribution. Find
a value of t such that only 1% of all values of 15 will be smaller. Solution. The df is n — l = 10 —— 1 = 9. The necessary tvalue must be in the lower
portion of the 75 distribution with area 0.01 to its left, as shown in the figure below. Since
the t distribution is symmetric about 0, this value is simply the negative of the value on the
right—hand side with area 0.01 to its right, or —to 05 = —2.821. f(t) Requirements for Student’s t Distribution * The sample must be selected at random from the population. * The population from which the sample is drawn must be normally distributed.
10.3 SmallSample Inferences Concerning A Population Mean Just as the (standard) normal distribution was the underlying probability distribution
for making largesample inferences, the Student’s 75 distribution plays a central role in small—
sample inferences SmallSample Hypothesis Test for u
l. Null hypothesis: H0 : u = no
2., Alternative hypothesis: Ha : u > Mo (Upper—Tailed Test)
H a : ,u <1 ,uo (Lower—Tailed Test)
Ha : ,u 75 Mo (TwoTailed Test) 3.. Test Statistic: _
_w~m TSNQ t 4. Rejection Region: Reject H0 when t > ta for Ha : ,u > no (UpperTailed Test)
75 < "ta for Ha : ,u < Mo (Lower—Tailed Test)
t > tat/2 or t < —ta/2 for Ha : ,u % no (Two—Tailed Test) or when pvalue <i a. Here the critical values of t, to and tat/2, are based on n — 1 degrees of
freedom from the Student’s 25 probability table. tioxt Assumptions: The sample of size n is selected at random from a normally distributed
population. Small—Sample (1 — a) 100% Conﬁdence Interval for [L1, (a where s/ﬁ is the estimated standard error of '23, often referred to as the standard error of
the mean. Example 3. A new process for producing synthetics diamonds can be operated at a
proﬁtable level only if the average weight of the diamonds is greater than 05 karat. To
evaluate the proﬁtability of the process, six diamonds are generated, with recorded weight
0.46, 0.61, 0.52, 0.48, and 0.54 karat. Do the six measurements present sufﬁcient evidence
to indicate that the average weight of the diamonds produced by the process is in excess of 0.5 karat? Solution.
1. We want to test the null hypothesis H0 I [J = 2. An appropriate one—tailed alternative hypothesis is Haza>05 3. The test statistic is
E — a0 _ 0.53 — 0.5 t=—m— S/x/ﬁ ——(10559/\/6 4. Rejection Region: If we choose a 5% level of signiﬁcance (a = 0.05), the right—tailed
rejection region is found using the critical value of t from Table 4 of Appendix I With
df = n — 1 = 5, we can reject H0 if t j> to 05 : 2.015, as shown in the ﬁgure below. f0) = 1.32. 0 1.32 2.015 ' 7 t —> Reject H0 5. Since 132 < 2.015, we cannot reject H0. The data do not present sufﬁcient evidence
to indicate that mean diamond weight exceeds 05 karat. 6. From Table 4 of Appendix I with df = 5, we read to 10 = 1.476; that is Pﬁ>1ﬂ®=0m. Since 132 < 1.476, we have
p—value = P{t > 1.32} > P{t > 1.476} = 0.10. The results are not signiﬁcant. I Example 4. Construct a 90% conﬁdence interval for ,a using the data in Example 5’.
Solution. From Table 4 of Appendix I with df = 5, we read 73005 = 2.015. A 90% conﬁdence interval for a is
“it —i
x a
/2 x/I“; 4 0.53 d: 2.015 x/é
0.53 i 0.046
01‘
0.484 < M < 0.576..
I Example 5. Labels on l—gallon cans of paint usually indicate the drying time and the
area that can be covered in one coat. Most brands of paint indicate that, in one coat, a
gallon will cover between 250 and 500 square feet, depending on the texture of the surface
to be painted. One manufacturer, however, claims that a gallon of its paint will cover 400
square feet of surface area. to test this claim, a random sample of ten lgallon cans of white
paint were used to paint ten identical areas using the same kind of equipment. The actual
areas (in square feet) covered by these 10 gallons of paint are given here: 310, 311, 412, 368, 447, 376, 303, 410, 365, 350 Do the data present sufﬁcient evidence to indicate that the average coverage differs from 400 square feet? Find the p—value for the test, and use it to evaluate the statistical signiﬁcance
of the results? Solution.
1. We want to test the null hypothesis H0 : ,u = 400.
2. An appropriate two—tailed alternative hypothesis is
H, : ,u 7E 400. 3. The test statistic is Z a — M0 = 365.2 —— 400 = 427.
s/ﬁ 48.417/ﬁ’0 4. Rejection Region: We will reject H0 if the test statistic t is either greater than ta/g or
smaller than —ta/2. The p~value is t pvalue = P{t > 2.27} + P {t < —2.27} = 2P {t > 2.27} .. Thus,
1 5 (p~value) 2: P {t > 2.27} .
From Table 4 of Appendix I with df = n — 1 = 9, we read to 025 = 2.821 and to 01 = 2.262; that is,
P{t > 2.821} = 0.025 and
P {t > 2.262} = 0.01. 5 Thus,
0.01 < P{t > 2.27} < 0025 Hence, 1
0.01 < i (p—value) < 0.025 or 0.02 < p—value < 0.05. f '(t) 1383 1.833 I: 2.821 3,250 "I
2.27 5. The data present sufﬁcient evidence to indicate that the average coverage differs from
400 square feet. I Example 6. Construct a 95% conﬁdence interval for M using the data in Emample 5. Solution. From Table 4 of Appendix I with df = 9, we read to 025 = 2.262. A 95% conﬁdence interval for ,u is
— i t i
37 a
/2 wt 365.2 :l: 2.262 V10
365.2 d: 34.63 or
3306 < ,u, < 399.8. Notice that the entire conﬁdence interval is to the left of 400. This reconﬁrms that we must
reject H0 : ,u = 400 in Example 5... HOMEWORK: pp.397 — 399
10.1, 10.2, 10.3, 10.5, 10.7, 10.13 10.4 Small—Sample Inferences for the Difference Between Two Population
Means: Independent Random Samples The mean and standard error of 531 — 52 are 2 2
a 0 #1 — M2 and “1 +‘ “2
n1 712 respectively. we can use the standard error of E1 — 52; 2 2 2 2
a a . s s
—1 + —2 estlmated by —1 + i
711 n2 n1 712 Unfortunately, when the sample sizes are small, the statistic (the standardization of El — 52) (fl—E2)(M1M) 2 2
iL+£2
"1 "2 does not have an approximately normal distribution — nor does it has a Student’s t dis—
tribution. In order to form a statistic with a sampling distribution that can be derived
theoretically, we must make one more assumption. Assume that 0% = 0% = 02. Then the standard error of the difference in the two sample means is __ ‘
2 2
a a 1 1
_1 _2 = 02 _ + _
77,1 n2 R1 712 Where 0'2 is estimated by the pooled sample variance 82: ("1—1)5i+(n2*1)53
nl+n2—2 The resulting test statistic
(51— 552)  (M1 — H2) 2 i _1_
stung) has a Student’s 75 distribution with (m — 1) + (n2 — 1) = m + 712 — 2 degrees of freedom, 15: Test of Hypothesis Concerning the Difference p1 — #2 Between TWO Means:
Independent Random Samples 1. Null hypothesis:
Ho = M1 — M2 = Do where D0 is some speciﬁed difference that we wish to test, (Often, we will hypothesize that
there is no difference between ,ul and #2; that is, D0 = 0.) 2. Alternative hypothesis:
Ha : M1 — M2 > Do (Upper—Tailed Test) Ha : M1 —— M2 < D0 (LowerTailed Test)
Ha : ,ul — p2 7é DO (Two—Tailed Test:) 3,. Test Statistic: _ _
(IE1 — i132) * D0 where
2 (n1 — 1)8i+(n2—1)8§ 8 2 m + n2 — 2
4.. Rejection Region: Reject H0 when
t > ta for H, : ,ul — M2 > Do (Upper—Tailed Test)
t < —ta for Ha : M1 — M2 < D0 (Lower—Tailed Test) t > ta/g or t < —ta/2 for Ha : M1 —— [1,2 # D0 (Two—Tailed Test:) or when p—value < a. Here the critical values of t, ta and ta/Z, are based on m +712 — 2 degrees
of freedom from the Student’s 15 probability table. 0/2
2.2. (1/2 Assumptions: 1. The samples are randomly and independently selected from normally distributed
populations. 2. The variances of the populations 0% and 0% are equal; that is,.o2 2 2 1:02:0. Small—Sample (1 — oz) 100% Conﬁdence Interval for lul ~v ,u2 Based on Indepen— dent Random Samples
/ 1 l
 _, ita 2 __ _ m
(1131 £132) /2 S (n1 i— n2) where s2 is the pooled estimate of 02; that is, 2 (n1—1>s%+ (712—1) 8 — ~..—..__..... ﬂ 711+7’L2 "2 Example 7. An assembly operation in a manufacturing plant requires approximately
a 1—month training period for a new employee to reach maximum efﬁciency in assembling
a device. A new method of training is suggested, and a test was conducted to compare
the new method with the standard procedure. Two groups of nine new employees were
trained for a period of 3 weeks, one group using the new method and the other following the
standard training procedure. The length of time (in minutes) required for each employee
to assemble the device was recorded at the end of the 3—week period (see the table below).
These measurements appears in the following table. Do the data present sufﬁcient evidence
to indicate that the mean time to assemble at the end of a 3—week training period is less for the new training procedure? _Standard Procedure New Procedure_ 32 35
37 31
35 29
28 25
41 34
44 40
35 27
31 32
34 31 Solution.
1. We want to test the null hypothesis Haral=n2 or Hot/Al—pQZO‘.
2. An appropriate righttailed alternative hypothesis is Hazul>u2 or Hazal—u2>0‘.
3.. The pooled sample variance is 2
2 (n1 — 1) 8i + (712 — 1) 33 _ 8 “BMWE = 22.2361” n1+n2—2 ‘ l
i 32 (g. + .1.) 22.2361 (5 + 5) n1 “2 which has a Student’s 23 distribution with n1 + 712 —‘ 2 = 16 degrees of freedom.
4. Rejection Region: We will reject H0 if the test statistic t is greater than ta. 5. The critical value approach: If we choose oz = 0.05, we read from Table 4 of Appendix
I with df = 16 that to 05 = 1,746. The null hypothesis H0 will be rejected if t > 1.746.. Since
165 < 1746.. we cannot reject H0. Hence, there is insufﬁcient evidence to indicate that the new training procedure is superior at the 5% level of signiﬁcance. f0) 6. The p—rvalue approach: The p—value is
pvvalue = P {t > 1,65}. 9 From Table 4 of Appendix I with df = 16, we read 75005 = 1.746 and to 10 = 1.337; that is,
P{t>1.746}= 0.05 and
P{t > 1.337} = 0.10.
Thus,
0.05 < P{t > 1.65} < 010
Hence, 0.05 < p—value < 0.10 Because the p—value is greater than 0.05, the results are not signiﬁcant. Example 8. Construct a 95% conﬁdence interval for M1 — [12 in Example 7. Using the
conﬁdence interval, can we conclude that there is a difference in the population means for the two groups of employees?
Solution. The point estimate of M1 — M2 is El — 52 = 35.22 — 31.56 = 3.66 and the standard error is 1 _____
s2 —— + —L = 22.2361 l +1 = 6.039.
n1 n2 9 9 Since 1 — 04 = 0.95 or a = 0.05, we have tat/2 = 2.120 with 16 degrees of freedom. Thus, the
95% conﬁdence interval is 1 1
 —— ita 2 — —
($1 332) /2 S (n1 '1‘ “2) 3.66 :1: (2.120) (6.039)
3.66 :t 4.71 01'
—1..05 < a, —— #2 < 8.37. This interval gives us a range of possible values of the difference in the population means.
Since the hypothesized difference, pl — M2 = 0, is contained in the conﬁdence interval, we should not reject H0. Unequal Population Variances If the population variances are far from equal, there is an alternative procedure for
estimation and testing that has an approximate 75 distribution in repeated sampling. As a
rule of thumb, we should use this procedure if the ratio of the two sample variances, Larger 52 Smaller s2 > 3" 10 For example, the ratio of two sample variances in Example 5 is (4.94)2 (4 48),, = 1.22 < 3 which makes the pooled method appropriate. Since population variances are not equal, the pooled estimator s2 is no longer appropriate,
and each population variance must be estimated by its corresponding sample variance. The resulting test statistic is
(57—1 — E2) — D0 2 2
ir+£a
n1 712 When the sample sizes are small, critical values for this statistic are found using degrees of
freedom approximated by the formula
82 S2 2
(#1 + 2%) ’ng—l dfz The degrees of freedom are taken to be the integer part of this result. Example 9. A study was conducted to estimate the difference in the amount of the
chemical orthophosphorus measured at two different stations on a major river. Orthophos—
phorus is measured in milligrams per liter. Fifteen samples were collected from station 1
and 12 samples were obtained from station 2. The 15 samples from station 1 had a mean
orthophosphorus content of 384 milligrams per liter and a standard deviation of 3.07 mil—
ligrams per liter, while the 12 samples from station 2 had a mean orthophosphorus content
of 149 milligrams per liter and a standard deviation of 080 milligrams per liter. Test the
null hypothesis of equal population means against the alternative that the mean of the ﬁrst
population is greater than that of the second population; that is; H0 : ,u1 = ,uZ versus Hazy1 >,u2. Solution. Based on sample information, we have m = 351 Z 81 = n1 2 E2 = 82 = Since the ratio of two sample variances is (3.07)? (0 80), = 14726 :> 3, the pooled method is not appropriate. To use the alternative method, we need the degrees
of freedom approximated by (3.07)2 (0.80)2 2
i 15 + 12 i (3 07)2(15 2 + [(0 80)2/12 2 15—1 12~—1 df z = 16.3 m 16. 11 The test statistic is 3 84 1 49
t = = 2.847.
(3.07) (0.80)
15 + 12 Since we reject H0 if t > ta, the p—value is
pvalue : P {t > 2.847}
From Table 4 of Appendix I with df = 16, we read tom = 2.583 and t0‘005 = 2.921; that is, P {t > 2.583} = 0.01 and
P{t > 2.921} = 0.005.
Thus,
0005 < P{t > 2847} < 0.01
or 0005 < p—value < 001. Hence, the results are highly signiﬁcant and we conclude the the alternative hypothesis. HOMEWORK: pp.406 — 409 10.19, 10.21, 10.23, 10.25, 10.27, 10.29 10.5 SmallSample Inferences for the Difference Between Two Population
Means: A PairedDifference Test In many situations, the observations in the two samples come in pairs so that the two
observations are related. For instance, if we run a test on a new diet using 15 individuals, the
weights before and after completion of the test form our two samples. Observations in the
two samples made on the same individual are related and hence form a pair. To determine if the diet is effective, we must consider the differences, d1, d2, .   , dn, of paired observations. The sample mean of differences, d1, d2,    , d”, is calculated as _ 1”
d=ggdr LGt #d = #1 — H2 Paired—Difference Test of Hypothesis Test for ,ud = ,ul —,u2: Dependent Samples
1. Null hypothesis: H0 : ad 2 0 12 ...
View
Full Document
 Spring '08
 Cheng
 Statistics, Normal Distribution, Standard Deviation, Variance, Null hypothesis

Click to edit the document details