This preview shows pages 1–20. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CHAPTER 8
LARGESAMPLE ESTIMATION 8.3 Type of Estimators To estimate the value of a population parameter, we can use information from the sample
in the form of an estimator. For example, we may use the sample mean E to estimate the
population mean a, and the sample mean can be obtained by the formula r1+x2++:cn
n E: where $1, 202,   , mm are observations in the sample. Estimators are calculated using infor—
mation from the sample observations, and hence, by deﬁnition they are also statistics. Deﬁnition 1. An estimator is a rule, usually expressed as a formula, that tells us how
to calculate an estimate based on information in the sample. Estimators are used in two different ways: *7 Point estimation: Based on sample data, a single number is calculated to estimate
the population parameter. The rule or formula that describes this calculation is called the
point estimator, and the resulting number is called a point estimate. * Interval estimation: Based on sample data, two number are calculated to form an
interval within which the population parameter is expected to lie, The rule or formula that
describes this calculation is called the interval estimator, and the resulting pair of numbers
is called an interval estimate or conﬁdence interval. Example 1,, A veterinarian wants to estimate the average weight gain per month of
4—month—old golden retriever pups that have been placed on a lamb and rice diet. The pop—
ulation consists of the weight gains per month of all 4—month—old golden retriever pups that
are given this particular diet. The veterinarian wants to estimate the unknown parameter ,a, the average monthly weight gain for this hypothetical population. One possible estimator
based on sample data is the sample mean _ m1—+a:2++xn 1n
m=——————~——=——E x,.
n nixl It could be used in the form of a single number or point estimate — for instance, 3.8 pounds
— or we could use an interval estimate and estimate that the average weight gain will be between 2.7 and 4.9 pounds. 8.4 Point Estimation In a practical situation, there may be several statistics that could be used as point esti—
mators for a population parameter. To decide which of several choices id best, we need to
know how the estimator behaves in repeated sampling, described by its sampling distribution. Sampling distributions provide information that can be used to select the best estimator.
First, the sampling distribution of the point estimator should be centered over the true value
of parameter to be estimated. That is, the estimator should not be consistently underesti—
mate or overestimate the parameter of interest. Such an estimator is said to be unbiased. Deﬁnition 2. An estimator of a parameter is said to be unbiased if the mean of its
sampling distribution is equal to the true value of the parameter. Otherwise, the estimator said to be biased. The sampling distribution for a biased estimator is shown as follows. The sampling
distribution for a biased estimator is shifted to the right of the true value of the parameter.
This biased estimator is more likely than an unbiased one to over estimate the true value of the parameter. Unbiased estimator Biased estimator True value
of par ameter’ The second desirable characteristic of an estimator is that the spread (as measured by the
variance) of the sampling distribution should be as small as possible. This ensures that, with
a high probability, an individual estimate will fall close to the true value of the parameter.
The sampling distributions for two unbiased estimators, one with a small variance and the other with a large variance, are shown as follows. Estimator with smaller
/ variance Estimator
with larger
variance True value
of parameter Naturally, we would prefer the estimator with the smaller variance because the estimates
tend to lie closer to the true value of the parameter than in the distribution with the larger
variance. Of course, we would like to know how close the estimate to the true value of the parameter. Deﬁnition 3.. The distance between the estimate and the true value of the parameter
is called the error of estimation. When the sample sizes are large, the sampling distributions of unbiased estimators can
be approximated by a normal distribution (because of the Central Limit Theorem). For any
point estimator with a normal distribution, approximately 95% of all the point estimates
will lie within (or more exactly, 196) standard deviations of the mean of that distribution.
For unbiased estimators, this implies that the difference between the point estimator and
the true value of the parameter will be less than 196 standard deviations or 196 standard errors (SE) . i
l i .
1.96SE 1 968E i l V True value
M Margin of argin of
error error
A particular estimate Sample
estimator Deﬁnition 4. The quantity, 1.968E, is called the 95% margin of error (or simply the
margin of error). This quantity provides a practical upper bound for the error of estimation. It is possible
that the error of estimation will exceed this margin of error, but that is very unlikely. Point Estimation of a Population Parameter * Point Estimator: A statistic calculating using sample measurements * 95% Margin of Error: 1.96 x Standard error of the estimator It can be shown that both estimator is‘ and f5, discussed in Chapter 7, have the minimum
variability of all unbiased estimators and are thus the best estimator we can ﬁnd in each situation. How To Estimate a Population Mean a? * To estimate the population mean ,a for a quantitative population, the point estimator
'33 is unbiased with standard error estimated as 3E=i ﬂ where 5 is the sample standard deviation. * The 95% margin of error when n 2 30 is estimated as 8
$1.96—
W. Example 2.. An investigator is interested in the possibility of merging the capability
of television and the Internet. A random sample of n = 50 Internet users who were polled
about the time they spend watching television produced an average of 11.5 hours per week,
with a standard deviation of 3.5 hours. Use this information to estimate the population mean time Internet users spend watching television. Solution. The random variable measured is the time spent watching television per week.
This is a quantitative random variable best described by its mean a. The point estimate of
a, the average time Internet users spend watching television, is E = 11.5 hours. The margin of error is estimated as S 3.5
1.96SE = 1.96 — = 1.96 — = 0.97 z I.
(V77) (V50) We feel fairly conﬁdent that the sample estimate of 11.5 hours of television watching for
Internet users is within i1 hour of the population mean. How To Estimate a Population PrOportion p? * To estimate the population proportion p for a binomial population, the point estimator
1/5: 33/71 is unbiased with standard error estimated as SEzi/Eg
n * The 95% margin of error is estimated as item/m
77/ Example 3. In addition to the average time Internet users spend watching television, the
investigator is also interested in estimating the proportion of individuals in the population
at large who want to purchase a television that also acts as a computer. In a random sample
of rt = 100 adults, 45% in the sample indicated that they might buy one. Estimate the true when nﬁ> 5 and n§> 5. population proportion of adults who are interested in buying a television that also acts as a
computer, and ﬁnd the margin of error for the estimate. Solution. The parameter of interest is now p, the proportion of individuals in the popu—
lation who want to purchase a television that also acts as a computer. The best estimator of
p is the sample proportion, 13, which for this sample is 1? = 0.45. In order to ﬁnd the margin
of error, we can approximate the value of p with its estimate ii: 0.45: 1.96SE = 1.98, ME = 1.96i/(O—"45M2 z 0.10.
n 100 With this margin of error, we can be fairly conﬁdent that the sample estimate of 0.45 is
within i010 of the true value of p. Hence, we can conclude that the true value of p could
be as small as 0.35 or as large as 0.55. This margin of error is quite large when compared
to the estimate itself and reﬂects the fact that large samples are required to achieve a small margin error when estimating p. population mean. The variability of estimator is measured using its standard error. However, we might
have noticed that the standard error usually depends on unknown parameters such as a or
p. These parameters must be estimated using sample statistics such as s (the sample stan—
dard deviation) and if (the sample proportion). Although not exactly correct, experiments
generally refer to the estimated standard error as the standard error. In reporting research results, investigators often attach either sample standard deviation
5 (sometimes called SD) or the standard error 8/\/7—l as follows: EiSD or EiSE HOMEW‘ORK: pp.305 — 307
8.3, 8.5, 8.7, 8.9, 8.11, 8.12, 8.13, 8.15, 8.17, 8.21 8.5 Interval Estimation An internal estimator is a rule for calculating two numbers — say, a and b  to create an
interval that we are fairly certain contains the parameter of interest. The concept of “fairly
certain” means “with high probability”. We measure this probability using the conﬁdence
coeﬁcient, denoted by 1 ~— a. Deﬁnition 5. Interval estimators are commonly referred to as conﬁdence intervals. The
upper and lower endpoints of a conﬁdence interval are called the upper and lower conﬁdence
limits. The probability that a conﬁdence interval will contain the estimated parameter is called the conﬁdence coeﬁicient. For example, experimenters often construct 95% conﬁdence intervals. This means that
the conﬁdence coefﬁcient, or the probability that the interval will contain the estimated
parameter, is 0.95. We can increase or decrease our amount of certainty by changing the
conﬁdence coefﬁcient. Some values typically used by experimenters are 090, 0.95, 0.98, and 0.99.
Constructing a Conﬁdence interval
When the sampling distribution of a point estimator is approximately normal, an inter— val estimator or conﬁdence interval can be constructed using the following reasoning. For
simplicity, assume that the conﬁdence coefﬁcient is 0.95 and refer to the following ﬁgure. Par ameter Estimator l———————————— Parameter i 1 96 SE * We know that, of all possible values of the estimator that we might select, 95% of them will be in the interval
Parameter ﬂ: 1.963E shown in the ﬁgure above. * Since the value of the parameter is unknown, consider constructing the interval Estimator :t 1.96SE which has the same width as the ﬁrst interval, but has a variable center. * How often will this interval work properly and enclose the parameter of interest? Refer
to the ﬁgure below. The ﬁrst two interval works properly — the parameter (marked with a
dotted line) is contained within both intervals. The third interval does not work, since it
fails to enclose the parameter. This happens because the value of the estimator at the center
of the interval was too far away from the parameter. Fortunately, values of the estimator
only fall this far away 5% of the time  our procedure will work properly 95% of the time. Interval 1 Interval 2
Interval 3 We may want to change the conﬁdence coefﬁcient from 1— a = 0.95 to another conﬁdence
level 1— oz. To accomplish this, we need to change the value 2 = 19,96, which locates an area
095 in the center of the standard normal curve, to a value of 2 that locates the area 1—— a in the center of curve, as shown in the ﬁgure below. f(z) Since the total area under the curve is 1, the remaining area in the two tails is a, and
each tail contains area a / 2. The value of 2 that has “tail area” a / 2 to its right is denoted by
za/z, and the area between —za/2 and za/z is the conﬁdence coefﬁcient 1 —v a. Values of Za/g
that are typically used by experimenters will become familiar to us as we begin to construct
conﬁdence interval for different practical situations. Some of these values are given in the following table, Conﬁdence Coefﬁcient 1 — or a a/2 za/g
0 90 0 10' 005"" 1 645
0 95 0.05 0025 1 96
0 98 0.02 0.01 2 33
0 99 0 01 0.005 2 575 A (1 ~ (1) 100% LargeSample Conﬁdence Interval
(Point estimator) :l: Za/g SE where za/g is the z—value with an area (1/2 in the right tail of a standard normal distribu—
tion. This formula generates two values; the lower conﬁdence limit (LCL) and the upper conﬁdence limit (U CL). LargeSample Conﬁdence Interval for a Population Mean ,u When the sample size n is large, the sample mean E is the best point estimators for the
population mean ,u. Since its sampling distribution is approximately normal, it can be used
to construct a conﬁdenCe interval according to the general approach given earlier, A (1 — a) 100% LargeSample Conﬁdence Interval for a Population Mean ,u E :l: Za/Qi ﬂ where za/g is the z—value with an area a/Z in the right tail of a standard normal distribution,
n is the sample size, and a is the standard deviation of the sampled population. If a is unknown, it can be approximated by the sample standard deviation 5 when the
sample size is large (n 2 30) and the approximate conﬁdence interval is ii Za/z“: w; Example 4. A scientist is monitoring chemical contaminants in food, and thereby the
accumulation of contaminants in human diets, selected a random sample of n = 50 male
adults. It was found that the average daily intake of dairy product was i = 756 grams
per day with a standard deviation of s = 35 gram per day. Use this sample information to
construct a 95% conﬁdence interval for the mean daily intake of dairy products for men. Solution. Since the sample size of n = 50 is large, the distribution of the sample mean 5
is approximately normally distributed with mean ,u and standard error estimated by s/ The approximate 95% conﬁdence interval is M 1796: V5 756 i 1.9635— «n
756 3: 9770 Hence, the 95% conﬁdence interval for ,u is from 746.30 to 7657.70 grams per day... Interpreting the Conﬁdence Interval What does it mean to say we are “95% conﬁdent” that the true value of the population
mean ,u is within a given interval? If we were to construct 20 such intervals, each using
different sample information, our interval might look like those shown in Figure 8.9. Of the
20 intervals, we might expect that 95% of them, or 19 out of 20, will perform as planned and contain a within their upper and lower bounds. 20— 16— —£’
— 3—1:
:l: ’5
 ___..._ 00
 Interval number We cannot absolutely sure that any one particular interval contains the mean ,u. We will
never know whether our particular interval is one of the 19 that “worked”, or whether it is
the one interval that “missed”. Our conﬁdence in the estimated interval follows from the
fact that when repeated intervals are calculated, 95% of these intervals will contain a. Example 5. Construct a 99% conﬁdence interval for the mean daily intake of dairy
products for adult men in Example 4. Solution. We must ﬁnd the appropriate value of za/g when 1 — 07 z 0.99. This value 2042,
with tail value 07/ 2 = 0.005 to its right, is found to be 2.58 or 2.575. The 99% conﬁdence interval is then S
.725 i (2.57 5) — W 756 :t (2.575) —35— x/S—(l
756 i 12.77 Hence, the 95% conﬁdence interval for p is from 743.23 to 768.77 grams per day. f(z) LargeSample Conﬁdence Interval for a Population Proportion p When the sample size n is large, the sample proportion, :1: Total number of successes 1’5": 7—— : ——————__.—————— 3 n Total number of trials— is the best point estimators for the population proportion p. Since its sampling distribution
is approximately normal, with mean p and standard error SE=,/m
n 15 can be used to construct a conﬁdence interval according to the general approach given in
this section. A (1 — a) 100% LargeSample Conﬁdence Interval for a Population Proportion P iii 204/2 E!
TL where Za/g is the z—value with an area (1/2 in the right tail of a standard normal distribution.
Since p and q are unknown, they are estimated using the best estimators: if and if. The
sample size n is considered large when the normal approximation to binomial distribution is
adequate — namely, when nﬁ > 5 and nff > 5. Example 6. A random sample of 985 “likely” voters — those who are likely to vote in the
upcoming election — were polled during a phone—athon conducted by the Republican party.
Of those survey, 592 indicated that they intended to vote for the Republican candidate in
the upcoming election. Construct a 90% conﬁdence interval for p, the proportion of likely
voters in the population who intend to vote for the Republican candidate. Based on this information, can we conclude that the candidate will win the election? Solution. The point estimator for p is A m 592
p n 985 60 pg (0.601) (0.399)'
,/— % ‘/—————— 20.16.
n 985 0 Since 1 — a = 0.90 or oz/2 = 0.05, we obtain Zia/2 = 1.645. The approximate 90% conﬁdence
interval for p is thus and the standard error is iii— 1.645 3‘1 TL
0.601i(1.645)(0.016) 0.601 2!: 0.026 01'
0575 < p < 0627. Hence, we estimate that the percentage of likely voters who intend to vote for the Republican
candidate is between 57.5% and 62.7%. Will the candidate win the election? Assuming that
she needs more than 50% of the vote to win, and since both the upper and lower conﬁdence
limits exceed this minimum value, we can say with 90% conﬁdence that the candidate will win. HOMEWORK: pp.316 — 317 8.23, 8.25, 8.27, 8.31, 8.33, 8.35 8.6 Estimating the Difference between Two Population Means 10 A problem equally as important as the estimation of a single population mean it for a
quantitative population is the comparison of two population means. Let the ﬁrst population have mean ,ul and variance 0%, and let the second population have
mean M2 and variance 0%. A random sample of 711 measurements is drawn from population 1,
which produces mean El and variance 5%, A second random sample of size 712 is independently drawn from population 2 , which produces mean T152 and variance 3%. Population 1 Population 2
Mean #1 H2
Variance a? 03 Sample 1 Sample 2 Mean El E2
Variance 8% 3%
Sample size ml 712 Properties of the Sampling Distribution of the Differences between two Sam
ple Means When independent random samples of m and 712 observations have been selected from
populations with means #1 and M2 and variances of and 03, respectively, the sampling dis—
tribution of the difference 51 — 52 has the following properties: 1.. The mean of El — 52 is ,ul — p2 and the standard error is n1 n2 2 2
51 82 SE% —+—
n1 n2 which can be estimated as when the sample sizes m and 712 are large. 2., If the sampled populations are normally distributed, then the sampling distribution
of El — 552 is exactly normally distributed, regardless of the sample size. 3. If the sampled populations are not normally distributed, then the sampling distribu—
tion of E1 —— 3’52 is approximately normally distributed,when m and 722 are both 30 or more,
according to the Central Limit Theorem. Since #1 — M2 is the mean of the sampling distribution, it follows that 51 — E2 is an
unbiased estimator of ,ul — #2 with an approximately normal distribution when 721 and 712
are large. That is, the statistic Z Z (51 —E2) — (M1  #2) has an approximately standard normal 2 distribution LargeSample Point Estimation of 01 — 02 Point Estimator: E1 — f2.
95% Margin of error: :l:1.96SE 2 ::1.96\/:—%1 + A (1 — a) 100% LargeSample Conﬁdence Interval for 01 — M2 8% 8%
($1—$2):l:Za/2 Example 7,, The wearing qualities of two types of automobile tires were compared by
road—testing samples of m = n2 = 100 tires for each type. The number of miles until wear
out was deﬁned as a speciﬁc amount of tire wear. The test results are given as follows: Type 1 Type 2 n1 2 7Z2 2 351 = 26, 400 32 = 25,100 8% = 1,440,000 8% = 1, 960, 000 Estimate ,ul — M2, the difference in mean miles to wear out, using a 99% conﬁdence interval.
Is there a difference in average wearing quality for the two types of tires? Solution. The point estimate of M1 — M2 is
El — 52 = 26,400 — 25,100 = 1300
and the standard error is 2 2'
s, 52 _ 1,440,000 1,960,000 _
m + n; — 100 + 100 _ 1844" Since 1 — a = 0.99 or a = 0.01, we have Za/g = 2575. Thus, the 99% conﬁdence interval is
1300 :1: (2.575) (184.4) 1300 :1: 4758 or
8242 < M — [Ll/2 < 17758,, The difference in the average miles to wear out for the two types of tires is estimated to lie
between LCL :2 824.2 and UCL = 17758 miles of wear, Since 0 is not included in this interval, it is not likely that the means are the same.
Therefore, we can conclude that there is a difference in average miles to wear out for the two
types of tires. 12 Example 8. The scientist in Example 4 wondered whether there was a difference in
the average daily intakes of dairy products between men and women. He took a sample of
n = 50 adult women and recorded their daily intakes of dairy products in grams per day“
He did the same for adult men, A summary of his sample results is listed as follows: Men Women
711 = 50 712 = 50
fl = 756 '52 = 762
51 = 82 I Construct a 95% conﬁdence interval for the difference in the average daily intakes of daily
products for men and women. Can we conclude that there is a difference in average daily
intakes for men and women? Solution. The point estimate of M1 — ,u2 is
31 ~52 2756—762 = —6 and the standard error is I 2 2 l 2 2 81 32 35 30
— —— : ’—‘“ —' : ” 2H
m + n2 50 + 50 6 5 Since 1 — a = 095 or or = 0.05, we have Za/g = 196. Thus, the 95% conﬁdence interval is
—6 i (1.96) (6.52)
—6 i 12,78 01'
—18,78 < M, — #2 < 678,, It is possible that the difference The difference #1 — M2 could be negative (indicating that
the average for women exceeds the average for men), it could be positive (indicating that
men have the higher average), or it could be 0 (indicating that no difference between the
averages). Based on this information, we hesitate to conclude that there is a difference in average daily intakes of dairy products for men and women. Note. (1) When both sampled populations are normal, the sampling distribution of (El “ 352) — (#1 " M2) 2 2
"_1 32
n1 722 has exactly a standard normal distribution for all sample sizes (small or large). (2) When the sampled pOpulations are not normal but the sample sizes are large (2 30), the sampling distribution of
(El — 552) — (#1 — M2) 2 2
ZL+ZZ
n1 n2 is approximately standard normally distributed, (3) If population variances of and a; are unknown and are estimated by the sample variances 3% and 3%,
(51 — 752) — (#1 — M2) 2 2
£1.+§.2.
n1 n2 will still have an approximately standard normal distribution when sample sizes are large
(2 30)” HOMEWORK: ppﬂ321 — 323
8,39, 8,41, 8,45, 8,47, 849 8.7 Estimating the Difference between Two Binomial Proportions In order to estimate the difference between two binomial proportions, independent sam—
ples consisting of m and 722 trials are drawn from populations 1 and 2, respectively, and the
sample estimates 231 and 152 are calculated. The unbiased estimator of p1 — p2 is the sample difference 51 — 1’52. Properties of the Sampling Distribution of the Differences between two Sam
ple Proportions Assume that independent random samples of m and 712 observations have been selected
from populations with parameters p1 and p2, respectively. The sampling distribution of the
difference between sample proportions A A $1 332
pl “102 = — " —
711 72,2 has the following properties: 1,, The mean of 1/51 — 1/52 is p1 — p2 and the standard error is
SE 2 P191 + P2Q2
n1 n2 SE=—— which can be estimated as when the sample sizes 711 and 712 are large, 2‘. The sampling distribution of 131 — 172 can be approximated by a normal distribution
when m and 712 are large, according to the Central Limit Theorem, 14 To use a normal distribution to approximate the distribution of 131 —— 182, both ii and 1/52
should be approximately normal; that is, 711131 > 5, 7115]} > 5, and 712132 > 5, n22]; > 5, Since f9] ~52 is an unbiased estimator of 191 —p2 with an approximately normal distribution
when m and m are large, the statistic (161 " 272) ‘ (P1 —P2) 5151 132172 has an approximately standard normal 2 distribution.
Large—Sample Point Estimation of p1 — p2 Point Estimator: f7} — I32” 95% Margin of error: tingssE = $1.96\/ﬁ + £13.
m ’02 A (1 — oz) 100% Large—Sample Conﬁdence Interval for p1 —— p2 (P1 "172) i Za/zﬂpl—l + pig"
n1 n2 Assumptions: Sample sizes 713, and 112 must be suﬂiciently large so that the sampling
distribution of 1’51 — fig can be approximated by a normal distribution — namely, if 711131 > 5,
771161 > 5, and 7712132 > 5, 712213 > 5. Example 9. A bond proposal for school construction will be submitted to the voters at
the next municipal election. A major portion of the money derived from this bond issue will
be used to build schools in a rapidly developing section of the city, and the remainder will be
used to renovate and update school buildings in the rest of the city. To assess the viability
of the bond proposal, a random sample of m = 50 residents in the developing section and
712 = 100 residents from the other parts of the city were asked whether they plan to vote for
the proposal. The result are tabulated as follows: m = 50 712 = 100
$1 = $2 = _§1_ = 3—53 = 0.76 133 = ﬂ = 0,65 (1) Estimate the difference in the true proportions favoring the bond proposal with a
99% confidence interval, 15 (2) If both samples were pooled into one sample of size n = 150, with 103 in favor of
the proposal, provide a point estimate of proportion of city residents who will vote for the bond proposal. What is the margin of error? Solution. (1) The best point estimate of p1 — p2 is
ﬁl — 11/52 = 0.76 — 0.65 = 0.11 and the standard error of f9] —— 53 is estimated as 23161 23252— (0.76) (0.24) (0.65) (0.55)
(/—— —= —— —~——= . 70.
m + 712 50 + 100 07 Since 1 — 04 = 0.99 or 01 = 0.01, we have za/g = 2.575. Thus, the 99% conﬁdence interval is
0.11 i (2.575) (0.0770) 0.11j:0.199 or
—0.089 < p1 — p2 < 0309. Since this interval contains the value p1 — p2 = 0, it is possible that p1 2 p2, which implies
that there may be no difference in the proportions favoring the bond issue in the two sections of the city. (2) If there is no difference in the to proportions, then the two samples are not really
different and might as well be combined to obtain an overall estimate of the prOportion of
the city residents who will vote for the bond issue. If both samples are pooled, then n = 150 and 103
A: —— = .6 .
p 150 0 9
Therefore, the point estimate of the overall value of p is 0.69, with a margin of error given
by
(0.69) (0.31) i196 W— = $1.96 (00378) = i0.074. Note that 0.69 d: 0.074 produces the interval 0.62 to 0.76, which includes only proportions
greater than 05. Therefore, if voter attitudes do not change adversely prior to the election,
the bond proposal should pass by a reasonable majority. HOMEWORK: pp.326 — 328 8.51, 8.53, 8.55, 8.57, 8.59 8.8 OneSided Conﬁdence Bounds 16 The conﬁdence interval with both lower conﬁdence limit (LCL) and upper conﬁdence
limit (UCL) are in general referred to as twosided conﬁdence intervals. However, we some—
times need only one of these limits; that is, we need only an upper bound (or possibly only a
lower bound) for the parameter of interest, such as u, 1), n1 — M2, or p1 — p2. In the situation i
like this, we can construct a onesided conﬁdence bound for the parameter of interest. When the sample size is large, the sampling distribution of a point estimator is approx—
imately normal according to the Central Limit Theorem. By an similar argument to the
preceding sections, one—sided conﬁdence bounds can be constructed such that it will contain
the true value of the parameter of interest (1 — a) 100% of the time in repeated sampling. A (1 — oz) 100% LargeSample Lower Conﬁdence Bound (LOB) (Point estimator) — 2a >< (Standard error of the estimator) ,. A (1 ~ a) 100% LargeSample Upper Conﬁdence Bound (LOB) (Point estimator) + 20, X (Standard error of the estimator) . The zvalue used for a (1 — oz) 100% one—sided conﬁdence bound, 2a, locates an area a in
a single tail of the standard normal curve as shown below. ﬂz) Example 10.. A random sample of 100 recorded deaths in the United States during the
past year showed an average life span of 71.8 years with a standard deviation of 8.9 years.
Construct a 99% lower conﬁdence bound for the population average life span today. Solution. In this case, 1 —‘ a = 0.99. The value 20,, with tail value a = 0.01 to its right,
is found to be 2.33. The 99% lower conﬁdence bound for n  the population average life span today — is then LOB = 71.8 — (2.33) = 71.8 — 2.0737 2 69.7263. 10 Hence, we are approximately 99% conﬁdence that the population average life span today
will be greater than 69.7263 years... 3 17 Example 11. The height of a random sample of 36 male students in a college showed a
mean of 174.5 centimeters and a standard deviation of 6.9 centimeters. Construct a 97.5%
upper conﬁdence bound for the mean height of all male students of the college. Solution. Note that l — a = 0.975. The value 2a, with tail value a = 0.0025 to its
right, is found to be 1.96.. The 97.5% upper conﬁdence bound for the mean height of all male students of the college is UCB = 174.5 + (1.96) = 174.5 + 2.254 = 176.754. \/ 36
Hence, it is approximately 97.5% conﬁdent that the mean male students in the college are
shorter than 176.754 centimeters. 8.9 Choosing the Sample Size How many measurements should be included in the sample? How much information does
the researcher want to buy? The total amount of information in the sample will affect the
reliability or goodness of the inferences made by the researcher, and it is this reliability that
the researcher must specify. In a statistical estimation problem, the accuracy of the estimate
is measured by the margin of error or the width of the conﬁdence interval. Since both of
these measures are a function of the sample size, specifying the accuracy determines the necessary sample size. Example 12. Suppose we want to estimate the average daily yield ,u of a chemical
process and we need the margin of error to be less than 4 tons. This means that, approxi—
mately 95% of the time in repeated sampling, the distance between the sample mean E and the population mean u will be less than 1.96SE. We want this quantity to be less than 4.
That is,
1.96SE < 4 or 1.96 < 4 Solving for n, we obtain 1. 2
n > 0'2 or n > 02402. If we know a, the population standard deviation, we can substitute its value into the formula
and solve for n. If a is unknown — which is usually the case ‘ we can use the best approxi—
mation available: * An estimate 5 obtained from a previous sample. * A range estimate based on knowledge of the largest and smallest possible measure
ments: 0 wRange/éi or 40 zRange. For this example, suppose that a prior study of the chemical process produced a sample
standard deviation of s = 21 tons. Then 71 > 0.2402 = 0.24 (21)2 = 105.8. 18 Using a sample size n = 106 or larger, we could be reasonably certain (with probability
approximately equal to 0.95) that our estimate of the average yield will be within i4 tons of the actual average yield. Sometimes researchers request a different conﬁdence level than the 95% conﬁdence spec—
iﬁed by the margin of error. In this case, the half width of the conﬁdence interval provides
the accuracy measure for our estimate; that is7 the bound B on the error of our estimate is za/g < B. This method for choosing the sample size can be used for all estimation procedures
presented in this chapter. The formula requires the standard error of the point estimator of
the parameter of interest. The table below summarizes the formulas used to determine the
sample sizes required for estimation with a given bound B on the error of the estimate or conﬁdence interval width W (W = 2B Parameter Estimator Sample Size Assumptions
———T_——~3WW—”_M—ﬁu—
IU’ 33 n 2 B2
2.2 (02+02
— — a 2 1 2
IllM2 561962 71245374 n1=n2=n
2., pq
n 2 4322—
p 15 or
. (0 25V: =
n B2 /2 p 0 5
\ Zi/2(P1(I1+PZQZ)
TL Z —B‘2—’~'— n1 — TLQ — TL
101 — P2 P1 — 202 01”
2
n > 2(0 25)Za/2 n1: 722 = n and p1 2 p2 = 0.5
_______—______._ _——Bl_ _ Example 13. Producers of polyvinyl plastic pipe want to have a supply of pipes sufﬁcient
to meet marketing needs. They wish to survey wholesalers who buy polyvinyl pipe in order
to estimate the proportion who plan to increase their purchases next year. What sample size is required if they want their estimate to be within 0.04 of the actual proportion with
probability equal to 0.90? Solution. Since B = 0.04 and 1 —‘ oz = 0.90 (so that Za/g = 1645), we have 1.645SE = 1.645 % < 0.04. We must substitute an approximate value of 30 into this inequality. Usually, we try p = 0.5.
Then _ 1.645 <. 0.04
or (1 645) (0 5)
ﬂ > 004——— I 20.56 19 or
n > (20.56)2 = 422.7,. Therefore, the producers must include at least 423 wholesalers in their survey if they want
to estimate the proportion p correct to within 0004,, Example 14‘. A personnel director wishes to compare the effectiveness of two methods
of training industrial employees to perform a certain assembly operation, A number of
employees are to be divided into two equal groups: the ﬁrst receiving training method 1 and
the second training method 2. Each will perform the assembly operation, and the length of
assembly time will be recorded. It is expected that the measurements for both groups will
have a range of approximately 8 minutes. For the estimate of the difference in mean times to assemble to be correct within 1 minute with a probability equal to 095, how many workers
must be included in each training group? Solution, In this case, B = 1 and 1 — a = 0.95 (so that za/g = 1.96). Then we have 2 2
1,96SE = ism/3 + (—73 < 1,
7L1 77,2 We assume n1 2: n2 2 n and obtain the inequality 02 02
1.96 —1 + —2 < 1,,
TL TL As noted above, the variability (range) of each method of assembly is approximately the same, and hence of = 0% = 02,, Since the range, equal to 8 minutes, is approximately equal to 40, we have
40 m 8 or 0 m 2 Substituting this value for 01 and 02 in the earlier inequality, we get 22 22
1,96  + — < 1
n n 01‘
8
1,96\/: < 1
77,
01‘
ﬂ < 1,96%
01’ n < (196)2 (8) = 301,733,, Therefore, each group should contain at least 71 = 31 workers. HOMEWORK: pp334 — 335 8,68, 8,69, 8.71, 8,73, 8,75, 8,77 20 ...
View
Full
Document
 Spring '08
 Cheng
 Statistics

Click to edit the document details