stats 363 Chapter-8 notes

stats 363 Chapter-8 notes - CHAPTER 8 LARGE-SAMPLE...

Info iconThis preview shows pages 1–20. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 12
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 14
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 16
Background image of page 17

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 18
Background image of page 19

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 20
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CHAPTER 8 LARGE-SAMPLE ESTIMATION 8.3 Type of Estimators To estimate the value of a population parameter, we can use information from the sample in the form of an estimator. For example, we may use the sample mean E to estimate the population mean a, and the sample mean can be obtained by the formula r1+x2+---+:cn n E: where $1, 202,- - - , mm are observations in the sample. Estimators are calculated using infor— mation from the sample observations, and hence, by definition they are also statistics. Definition 1. An estimator is a rule, usually expressed as a formula, that tells us how to calculate an estimate based on information in the sample. Estimators are used in two different ways: *7 Point estimation: Based on sample data, a single number is calculated to estimate the population parameter. The rule or formula that describes this calculation is called the point estimator, and the resulting number is called a point estimate. * Interval estimation: Based on sample data, two number are calculated to form an interval within which the population parameter is expected to lie, The rule or formula that describes this calculation is called the interval estimator, and the resulting pair of numbers is called an interval estimate or confidence interval. Example 1,, A veterinarian wants to estimate the average weight gain per month of 4—month—old golden retriever pups that have been placed on a lamb and rice diet. The pop— ulation consists of the weight gains per month of all 4—month—old golden retriever pups that are given this particular diet. The veterinarian wants to estimate the unknown parameter ,a, the average monthly weight gain for this hypothetical population. One possible estimator based on sample data is the sample mean _ m1—+a:2+---+xn 1n m=——-————~——=——E x,. n nixl It could be used in the form of a single number or point estimate — for instance, 3.8 pounds — or we could use an interval estimate and estimate that the average weight gain will be between 2.7 and 4.9 pounds.- 8.4 Point Estimation In a practical situation, there may be several statistics that could be used as point esti— mators for a population parameter. To decide which of several choices id best, we need to know how the estimator behaves in repeated sampling, described by its sampling distribution. Sampling distributions provide information that can be used to select the best estimator. First, the sampling distribution of the point estimator should be centered over the true value of parameter to be estimated. That is, the estimator should not be consistently underesti— mate or overestimate the parameter of interest. Such an estimator is said to be unbiased. Definition 2. An estimator of a parameter is said to be unbiased if the mean of its sampling distribution is equal to the true value of the parameter. Otherwise, the estimator said to be biased. The sampling distribution for a biased estimator is shown as follows. The sampling distribution for a biased estimator is shifted to the right of the true value of the parameter. This biased estimator is more likely than an unbiased one to over estimate the true value of the parameter. Unbiased estimator Biased estimator True value of par ameter’ The second desirable characteristic of an estimator is that the spread (as measured by the variance) of the sampling distribution should be as small as possible. This ensures that, with a high probability, an individual estimate will fall close to the true value of the parameter. The sampling distributions for two unbiased estimators, one with a small variance and the other with a large variance, are shown as follows. Estimator with smaller / variance Estimator with larger variance True value of parameter Naturally, we would prefer the estimator with the smaller variance because the estimates tend to lie closer to the true value of the parameter than in the distribution with the larger variance. Of course, we would like to know how close the estimate to the true value of the parameter. Definition 3.. The distance between the estimate and the true value of the parameter is called the error of estimation. When the sample sizes are large, the sampling distributions of unbiased estimators can be approximated by a normal distribution (because of the Central Limit Theorem). For any point estimator with a normal distribution, approximately 95% of all the point estimates will lie within (or more exactly, 196) standard deviations of the mean of that distribution. For unbiased estimators, this implies that the difference between the point estimator and the true value of the parameter will be less than 196 standard deviations or 196 standard errors (SE) . i l i . 1.96SE 1 968E i l V True value M Margin of argin of error error A particular estimate Sample estimator Definition 4. The quantity, 1.968E, is called the 95% margin of error (or simply the margin of error). This quantity provides a practical upper bound for the error of estimation. It is possible that the error of estimation will exceed this margin of error, but that is very unlikely. Point Estimation of a Population Parameter * Point Estimator: A statistic calculating using sample measurements * 95% Margin of Error: 1.96 x Standard error of the estimator It can be shown that both estimator is‘ and f5, discussed in Chapter 7, have the minimum variability of all unbiased estimators and are thus the best estimator we can find in each situation. How To Estimate a Population Mean a? * To estimate the population mean ,a for a quantitative population, the point estimator '33 is unbiased with standard error estimated as 3E=i fl where 5 is the sample standard deviation. * The 95% margin of error when n 2 30 is estimated as 8 $1.96— W. Example 2.. An investigator is interested in the possibility of merging the capability of television and the Internet. A random sample of n = 50 Internet users who were polled about the time they spend watching television produced an average of 11.5 hours per week, with a standard deviation of 3.5 hours. Use this information to estimate the population mean time Internet users spend watching television. Solution. The random variable measured is the time spent watching television per week. This is a quantitative random variable best described by its mean a. The point estimate of a, the average time Internet users spend watching television, is E = 11.5 hours. The margin of error is estimated as S 3.5 1.96SE = 1.96 — = 1.96 —- = 0.97 z I. (V77) (V50) We feel fairly confident that the sample estimate of 11.5 hours of television watching for Internet users is within i1 hour of the population mean.- How To Estimate a Population PrOportion p? * To estimate the population proportion p for a binomial population, the point estimator 1/5: 33/71 is unbiased with standard error estimated as SEzi/Eg n * The 95% margin of error is estimated as item/m 77/ Example 3. In addition to the average time Internet users spend watching television, the investigator is also interested in estimating the proportion of individuals in the population at large who want to purchase a television that also acts as a computer. In a random sample of rt = 100 adults, 45% in the sample indicated that they might buy one. Estimate the true when nfi> 5 and n§> 5. population proportion of adults who are interested in buying a television that also acts as a computer, and find the margin of error for the estimate. Solution. The parameter of interest is now p, the proportion of individuals in the popu— lation who want to purchase a television that also acts as a computer. The best estimator of p is the sample proportion, 13, which for this sample is 1? = 0.45. In order to find the margin of error, we can approximate the value of p with its estimate ii: 0.45: 1.96SE = 1.98, ME = 1.96i/(O—"4-5M2 z 0.10. n 100 With this margin of error, we can be fairly confident that the sample estimate of 0.45 is within i010 of the true value of p. Hence, we can conclude that the true value of p could be as small as 0.35 or as large as 0.55. This margin of error is quite large when compared to the estimate itself and reflects the fact that large samples are required to achieve a small margin error when estimating p. population mean.- The variability of estimator is measured using its standard error. However, we might have noticed that the standard error usually depends on unknown parameters such as a or p. These parameters must be estimated using sample statistics such as s (the sample stan— dard deviation) and if (the sample proportion). Although not exactly correct, experiments generally refer to the estimated standard error as the standard error. In reporting research results, investigators often attach either sample standard deviation 5 (sometimes called SD) or the standard error 8/\/7—l as follows: EiSD or EiSE HOMEW‘ORK: pp.305 — 307 8.3, 8.5, 8.7, 8.9, 8.11, 8.12, 8.13, 8.15, 8.17, 8.21 8.5 Interval Estimation An internal estimator is a rule for calculating two numbers —- say, a and b -- to create an interval that we are fairly certain contains the parameter of interest. The concept of “fairly certain” means “with high probability”. We measure this probability using the confidence coeficient, denoted by 1 ~— a. Definition 5. Interval estimators are commonly referred to as confidence intervals. The upper and lower endpoints of a confidence interval are called the upper and lower confidence limits. The probability that a confidence interval will contain the estimated parameter is called the confidence coefiicient. For example, experimenters often construct 95% confidence intervals. This means that the confidence coefficient, or the probability that the interval will contain the estimated parameter, is 0.95. We can increase or decrease our amount of certainty by changing the confidence coefficient. Some values typically used by experimenters are 090, 0.95, 0.98, and 0.99. Constructing a Confidence interval When the sampling distribution of a point estimator is approximately normal, an inter— val estimator or confidence interval can be constructed using the following reasoning. For simplicity, assume that the confidence coefficient is 0.95 and refer to the following figure. Par ameter Estimator l——————-——-|———-—-| Parameter i 1 96 SE * We know that, of all possible values of the estimator that we might select, 95% of them will be in the interval Parameter fl: 1.963E shown in the figure above. * Since the value of the parameter is unknown, consider constructing the interval Estimator :t 1.96SE which has the same width as the first interval, but has a variable center. * How often will this interval work properly and enclose the parameter of interest? Refer to the figure below. The first two interval works properly — the parameter (marked with a dotted line) is contained within both intervals. The third interval does not work, since it fails to enclose the parameter. This happens because the value of the estimator at the center of the interval was too far away from the parameter. Fortunately, values of the estimator only fall this far away 5% of the time - our procedure will work properly 95% of the time. Interval 1 Interval 2 Interval 3 We may want to change the confidence coefficient from 1—- a = 0.95 to another confidence level 1-— oz. To accomplish this, we need to change the value 2 = 19,96, which locates an area 095 in the center of the standard normal curve, to a value of 2 that locates the area 1—— a in the center of curve, as shown in the figure below. f(z) Since the total area under the curve is 1, the remaining area in the two tails is a, and each tail contains area a / 2. The value of 2 that has “tail area” a / 2 to its right is denoted by za/z, and the area between -—za/2 and za/z is the confidence coefficient 1 —v a. Values of Za/g that are typically used by experimenters will become familiar to us as we begin to construct confidence interval for different practical situations. Some of these values are given in the following table, Confidence Coefficient 1 — or a a/2 za/g 0 90 0 10' 005"" 1 645 0 95 0.05 0025 1 96 0 98 0.02 0.01 2 33 0 99 0 01 0.005 2 575 A (1 ~ (1) 100% Large-Sample Confidence Interval (Point estimator) :l: Za/g SE where za/g is the z—value with an area (1/2 in the right tail of a standard normal distribu— tion. This formula generates two values; the lower confidence limit (LCL) and the upper confidence limit (U CL). Large-Sample Confidence Interval for a Population Mean ,u When the sample size n is large, the sample mean E is the best point estimators for the population mean ,u. Since its sampling distribution is approximately normal, it can be used to construct a confidenCe interval according to the general approach given earlier, A (1 — a) 100% Large-Sample Confidence Interval for a Population Mean ,u E :l: Za/Qi fl where za/g is the z—value with an area a/Z in the right tail of a standard normal distribution, n is the sample size, and a is the standard deviation of the sampled population. If a is unknown, it can be approximated by the sample standard deviation 5 when the sample size is large (n 2 30) and the approximate confidence interval is ii Za/z“: w; Example 4. A scientist is monitoring chemical contaminants in food, and thereby the accumulation of contaminants in human diets, selected a random sample of n = 50 male adults. It was found that the average daily intake of dairy product was i = 756 grams per day with a standard deviation of s = 35 gram per day. Use this sample information to construct a 95% confidence interval for the mean daily intake of dairy products for men. Solution. Since the sample size of n = 50 is large, the distribution of the sample mean 5 is approximately normally distributed with mean ,u and standard error estimated by s/ The approximate 95% confidence interval is M 1796: V5 756 i 1.9635— «n 756 3: 9770 Hence, the 95% confidence interval for ,u is from 746.30 to 7657.70 grams per day... Interpreting the Confidence Interval What does it mean to say we are “95% confident” that the true value of the population mean ,u is within a given interval? If we were to construct 20 such intervals, each using different sample information, our interval might look like those shown in Figure 8.9. Of the 20 intervals, we might expect that 95% of them, or 19 out of 20, will perform as planned and contain a within their upper and lower bounds. 20— 16— —£’ — 3—1:- :l: ’5 | ___...|_- 00 | Interval number We cannot absolutely sure that any one particular interval contains the mean ,u. We will never know whether our particular interval is one of the 19 that “worked”, or whether it is the one interval that “missed”. Our confidence in the estimated interval follows from the fact that when repeated intervals are calculated, 95% of these intervals will contain a. Example 5. Construct a 99% confidence interval for the mean daily intake of dairy products for adult men in Example 4. Solution. We must find the appropriate value of za/g when 1 — 07 z 0.99. This value 2042, with tail value 07/ 2 = 0.005 to its right, is found to be 2.58 or 2.575. The 99% confidence interval is then S .725 i (2.57 5) — W 756 :t (2.575) —35— x/S—(l 756 i 12.77 Hence, the 95% confidence interval for p is from 743.23 to 768.77 grams per day. f(z) Large-Sample Confidence Interval for a Population Proportion p When the sample size n is large, the sample proportion, :1: Total number of successes 1’5": 7—— : ——————__.—————-—- 3 n Total number of trials— is the best point estimators for the population proportion p. Since its sampling distribution is approximately normal, with mean p and standard error SE=,/m n 15 can be used to construct a confidence interval according to the general approach given in this section. A (1 — a) 100% Large-Sample Confidence Interval for a Population Proportion P iii 204/2 E! TL where Za/g is the z—value with an area (1/2 in the right tail of a standard normal distribution. Since p and q are unknown, they are estimated using the best estimators: if and if. The sample size n is considered large when the normal approximation to binomial distribution is adequate — namely, when nfi > 5 and nff > 5. Example 6. A random sample of 985 “likely” voters — those who are likely to vote in the upcoming election — were polled during a phone—athon conducted by the Republican party. Of those survey, 592 indicated that they intended to vote for the Republican candidate in the upcoming election. Construct a 90% confidence interval for p, the proportion of likely voters in the population who intend to vote for the Republican candidate. Based on this information, can we conclude that the candidate will win the election? Solution. The point estimator for p is A m 592 p n 985 60 pg (0.601) (0.399)' ,/— % ‘/—————— 20.16. n 985 0 Since 1 — a = 0.90 or oz/2 = 0.05, we obtain Zia/2 = 1.645. The approximate 90% confidence interval for p is thus and the standard error is iii— 1.645 3‘1 TL 0.601i(1.645)(0.016) 0.601 2!: 0.026 01' 0575 < p < 0627. Hence, we estimate that the percentage of likely voters who intend to vote for the Republican candidate is between 57.5% and 62.7%. Will the candidate win the election? Assuming that she needs more than 50% of the vote to win, and since both the upper and lower confidence limits exceed this minimum value, we can say with 90% confidence that the candidate will win.- HOMEWORK: pp.316 —- 317 8.23, 8.25, 8.27, 8.31, 8.33, 8.35 8.6 Estimating the Difference between Two Population Means 10 A problem equally as important as the estimation of a single population mean it for a quantitative population is the comparison of two population means. Let the first population have mean ,ul and variance 0%, and let the second population have mean M2 and variance 0%. A random sample of 711 measurements is drawn from population 1, which produces mean El and variance 5%, A second random sample of size 712 is independently drawn from population 2 , which produces mean T152 and variance 3%. Population 1 Population 2 Mean #1 H2 Variance a? 03 Sample 1 Sample 2 Mean El E2 Variance 8% 3% Sample size ml 712 Properties of the Sampling Distribution of the Differences between two Sam- ple Means When independent random samples of m and 712 observations have been selected from populations with means #1 and M2 and variances of and 03, respectively, the sampling dis— tribution of the difference 51 — 52 has the following properties: 1.. The mean of El — 52 is ,ul — p2 and the standard error is n1 n2 2 2 51 82 SE% —+— n1 n2 which can be estimated as when the sample sizes m and 712 are large. 2., If the sampled populations are normally distributed, then the sampling distribution of El -— 552 is exactly normally distributed, regardless of the sample size. 3. If the sampled populations are not normally distributed, then the sampling distribu— tion of E1 —— 3’52 is approximately normally distributed,when m and 722 are both 30 or more, according to the Central Limit Theorem. Since #1 — M2 is the mean of the sampling distribution, it follows that 51 — E2 is an unbiased estimator of ,ul — #2 with an approximately normal distribution when 721 and 712 are large. That is, the statistic Z Z (51 —E2) — (M1 - #2) has an approximately standard normal 2 distribution Large-Sample Point Estimation of 01 — 02 Point Estimator: E1 — f2. 95% Margin of error: :l:1.96SE 2 :|:1.96\/:—%1 + A (1 — a) 100% Large-Sample Confidence Interval for 01 — M2 8% 8% ($1—$2):l:Za/2 Example 7,, The wearing qualities of two types of automobile tires were compared by road—testing samples of m = n2 = 100 tires for each type. The number of miles until wear out was defined as a specific amount of tire wear. The test results are given as follows: Type 1 Type 2 n1 2 7Z2 2 351 = 26, 400 32 = 25,100 8% = 1,440,000 8% = 1, 960, 000 Estimate ,ul — M2, the difference in mean miles to wear out, using a 99% confidence interval. Is there a difference in average wearing quality for the two types of tires? Solution. The point estimate of M1 — M2 is El — 52 = 26,400 — 25,100 = 1300 and the standard error is 2 2' s, 52 _ 1,440,000 1,960,000 _ m + n; — 100 + 100 _ 1844" Since 1 — a = 0.99 or a = 0.01, we have Za/g = 2575. Thus, the 99% confidence interval is 1300 :1: (2.575) (184.4) 1300 :1: 4758 or 8242 < M — [Ll/2 < 17758,, The difference in the average miles to wear out for the two types of tires is estimated to lie between LCL :2 824.2 and UCL = 17758 miles of wear, Since 0 is not included in this interval, it is not likely that the means are the same. Therefore, we can conclude that there is a difference in average miles to wear out for the two types of tires.- 12 Example 8. The scientist in Example 4 wondered whether there was a difference in the average daily intakes of dairy products between men and women. He took a sample of n = 50 adult women and recorded their daily intakes of dairy products in grams per day“ He did the same for adult men, A summary of his sample results is listed as follows: Men Women 711 = 50 712 = 50 fl = 756 '52 = 762 51 = 82 I Construct a 95% confidence interval for the difference in the average daily intakes of daily products for men and women. Can we conclude that there is a difference in average daily intakes for men and women? Solution. The point estimate of M1 — ,u2 is 31 ~52 2756—762 = —6 and the standard error is I 2 2 l 2 2 81 32 35 30 — -—— : ’—‘“ —' : ” 2H m + n2 50 + 50 6 5 Since 1 — a = 095 or or = 0.05, we have Za/g = 196. Thus, the 95% confidence interval is —6 i (1.96) (6.52) —6 i 12,78 01' —18,78 < M, — #2 < 678,, It is possible that the difference The difference #1 -— M2 could be negative (indicating that the average for women exceeds the average for men), it could be positive (indicating that men have the higher average), or it could be 0 (indicating that no difference between the averages). Based on this information, we hesitate to conclude that there is a difference in average daily intakes of dairy products for men and women.- Note. (1) When both sampled populations are normal, the sampling distribution of (El “ 352) — (#1 " M2) 2 2 "_1 32 n1 722 has exactly a standard normal distribution for all sample sizes (small or large). (2) When the sampled pOpulations are not normal but the sample sizes are large (2 30), the sampling distribution of (El — 552) — (#1 — M2) 2 2 ZL+ZZ n1 n2 is approximately standard normally distributed, (3) If population variances of and a; are unknown and are estimated by the sample variances 3% and 3%, (51 — 752) — (#1 — M2) 2 2 £1.+§.2. n1 n2 will still have an approximately standard normal distribution when sample sizes are large (2 30)” HOMEWORK: ppfl321 — 323 8,39, 8,41, 8,45, 8,47, 849 8.7 Estimating the Difference between Two Binomial Proportions In order to estimate the difference between two binomial proportions, independent sam— ples consisting of m and 722 trials are drawn from populations 1 and 2, respectively, and the sample estimates 231 and 152 are calculated. The unbiased estimator of p1 — p2 is the sample difference 51 — 1’52. Properties of the Sampling Distribution of the Differences between two Sam- ple Proportions Assume that independent random samples of m and 712 observations have been selected from populations with parameters p1 and p2, respectively. The sampling distribution of the difference between sample proportions A A $1 332 pl “102 = — " — 711 72,2 has the following properties: 1,, The mean of 1/51 — 1/52 is p1 — p2 and the standard error is SE 2 P191 + P2Q2 n1 n2 SE=—— which can be estimated as when the sample sizes 711 and 712 are large, 2‘. The sampling distribution of 131 — 172 can be approximated by a normal distribution when m and 712 are large, according to the Central Limit Theorem, 14 To use a normal distribution to approximate the distribution of 131 —— 182, both ii and 1/52 should be approximately normal; that is, 711131 > 5, 7115]} > 5, and 712132 > 5, n22]; > 5, Since f9] ~52 is an unbiased estimator of 191 —p2 with an approximately normal distribution when m and m are large, the statistic (161 " 272) ‘ (P1 —P2) 5151 132172 has an approximately standard normal 2 distribution. Large—Sample Point Estimation of p1 — p2 Point Estimator: f7} — I32” 95% Margin of error: tingssE = $1.96\/fi + £13. m ’02 A (1 -— oz) 100% Large—Sample Confidence Interval for p1 ——- p2 (P1 "172) i Za/zflpl—l + pig" n1 n2 Assumptions: Sample sizes 713, and 112 must be sufliciently large so that the sampling distribution of 1’51 — fig can be approximated by a normal distribution — namely, if 711131 > 5, 771161 > 5, and 7712132 > 5, 712213 > 5. Example 9. A bond proposal for school construction will be submitted to the voters at the next municipal election. A major portion of the money derived from this bond issue will be used to build schools in a rapidly developing section of the city, and the remainder will be used to renovate and update school buildings in the rest of the city. To assess the viability of the bond proposal, a random sample of m = 50 residents in the developing section and 712 = 100 residents from the other parts of the city were asked whether they plan to vote for the proposal. The result are tabulated as follows: m = 50 712 = 100 $1 = $2 = _§1_ = 3—53 = 0.76 133 = fl = 0,65 (1) Estimate the difference in the true proportions favoring the bond proposal with a 99% confidence interval, 15 (2) If both samples were pooled into one sample of size n = 150, with 103 in favor of the proposal, provide a point estimate of proportion of city residents who will vote for the bond proposal. What is the margin of error? Solution. (1) The best point estimate of p1 — p2 is fil — 11/52 = 0.76 — 0.65 = 0.11 and the standard error of f9] —— 53 is estimated as 23161 23252— (0.76) (0.24) (0.65) (0.55) (/——- —= -—— —~——= . 70. m + 712 50 + 100 07 Since 1 — 04 = 0.99 or 01 = 0.01, we have za/g = 2.575. Thus, the 99% confidence interval is 0.11 i (2.575) (0.0770) 0.11j:0.199 or —0.089 < p1 — p2 < 0309. Since this interval contains the value p1 —- p2 = 0, it is possible that p1 2 p2, which implies that there may be no difference in the proportions favoring the bond issue in the two sections of the city. (2) If there is no difference in the to proportions, then the two samples are not really different and might as well be combined to obtain an overall estimate of the prOportion of the city residents who will vote for the bond issue. If both samples are pooled, then n = 150 and 103 A: ——- = .6 . p 150 0 9 Therefore, the point estimate of the overall value of p is 0.69, with a margin of error given by (0.69) (0.31) i196 W— = $1.96 (00378) = i0.074. Note that 0.69 d: 0.074 produces the interval 0.62 to 0.76, which includes only proportions greater than 05. Therefore, if voter attitudes do not change adversely prior to the election, the bond proposal should pass by a reasonable majority.- HOMEWORK: pp.326 —- 328 8.51, 8.53, 8.55, 8.57, 8.59 8.8 One-Sided Confidence Bounds 16 The confidence interval with both lower confidence limit (LCL) and upper confidence limit (UCL) are in general referred to as two-sided confidence intervals. However, we some— times need only one of these limits; that is, we need only an upper bound (or possibly only a lower bound) for the parameter of interest, such as u, 1), n1 — M2, or p1 — p2. In the situation i like this, we can construct a one-sided confidence bound for the parameter of interest. When the sample size is large, the sampling distribution of a point estimator is approx— imately normal according to the Central Limit Theorem. By an similar argument to the preceding sections, one—sided confidence bounds can be constructed such that it will contain the true value of the parameter of interest (1 — a) 100% of the time in repeated sampling. A (1 —- oz) 100% Large-Sample Lower Confidence Bound (LOB) (Point estimator) — 2a >< (Standard error of the estimator) ,. A (1 ~ a) 100% Large-Sample Upper Confidence Bound (LOB) (Point estimator) + 20, X (Standard error of the estimator) . The z-value used for a (1 — oz) 100% one—sided confidence bound, 2a, locates an area a in a single tail of the standard normal curve as shown below. flz) Example 10.. A random sample of 100 recorded deaths in the United States during the past year showed an average life span of 71.8 years with a standard deviation of 8.9 years. Construct a 99% lower confidence bound for the population average life span today. Solution. In this case, 1 —‘ a = 0.99. The value 20,, with tail value a = 0.01 to its right, is found to be 2.33. The 99% lower confidence bound for n - the population average life span today — is then LOB = 71.8 — (2.33) = 71.8 — 2.0737 2 69.7263. 10 Hence, we are approximately 99% confidence that the population average life span today will be greater than 69.7263 years... 3 17 Example 11. The height of a random sample of 36 male students in a college showed a mean of 174.5 centimeters and a standard deviation of 6.9 centimeters. Construct a 97.5% upper confidence bound for the mean height of all male students of the college. Solution. Note that l — a = 0.975. The value 2a, with tail value a = 0.0025 to its right, is found to be 1.96.. The 97.5% upper confidence bound for the mean height of all male students of the college is UCB = 174.5 + (1.96) = 174.5 + 2.254 = 176.754. \/ 36 Hence, it is approximately 97.5% confident that the mean male students in the college are shorter than 176.754 centimeters.- 8.9 Choosing the Sample Size How many measurements should be included in the sample? How much information does the researcher want to buy? The total amount of information in the sample will affect the reliability or goodness of the inferences made by the researcher, and it is this reliability that the researcher must specify. In a statistical estimation problem, the accuracy of the estimate is measured by the margin of error or the width of the confidence interval. Since both of these measures are a function of the sample size, specifying the accuracy determines the necessary sample size. Example 12. Suppose we want to estimate the average daily yield ,u of a chemical process and we need the margin of error to be less than 4 tons. This means that, approxi— mately 95% of the time in repeated sampling, the distance between the sample mean E and the population mean u will be less than 1.96SE. We want this quantity to be less than 4. That is, 1.96SE < 4 or 1.96 < 4 Solving for n, we obtain 1. 2 n > 0'2 or n > 02402. If we know a, the population standard deviation, we can substitute its value into the formula and solve for n. If a is unknown — which is usually the case -‘ we can use the best approxi— mation available: * An estimate 5 obtained from a previous sample. * A range estimate based on knowledge of the largest and smallest possible measure- ments: 0 wRange/éi or 40 zRange. For this example, suppose that a prior study of the chemical process produced a sample standard deviation of s = 21 tons. Then 71 > 0.2402 = 0.24 (21)2 = 105.8. 18 Using a sample size n = 106 or larger, we could be reasonably certain (with probability approximately equal to 0.95) that our estimate of the average yield will be within i4 tons of the actual average yield.- Sometimes researchers request a different confidence level than the 95% confidence spec— ified by the margin of error. In this case, the half width of the confidence interval provides the accuracy measure for our estimate; that is7 the bound B on the error of our estimate is za/g < B. This method for choosing the sample size can be used for all estimation procedures presented in this chapter. The formula requires the standard error of the point estimator of the parameter of interest. The table below summarizes the formulas used to determine the sample sizes required for estimation with a given bound B on the error of the estimate or confidence interval width W (W = 2B Parameter Estimator Sample Size Assumptions ———T_——~3WW—”_M—fiu— IU’ 33 n 2 B2 2.2 (02+02 — — a 2 1 2 Ill-M2 561-962 71245-374 n1=n2=n 2., pq n 2 4322— p 15 or . (0 25V: = n B2 /2 p 0 5 \ Zi/2(P1(I1+PZQZ) TL Z —B‘2—’~'— n1 — TLQ -— TL 101 — P2 P1 — 202 01” 2 n > 2(0 25)Za/2 n1: 722 = n and p1 2 p2 = 0.5 _______—______._ _——Bl_ _ Example 13. Producers of polyvinyl plastic pipe want to have a supply of pipes sufficient to meet marketing needs. They wish to survey wholesalers who buy polyvinyl pipe in order to estimate the proportion who plan to increase their purchases next year. What sample size is required if they want their estimate to be within 0.04 of the actual proportion with probability equal to 0.90? Solution. Since B = 0.04 and 1 —‘ oz = 0.90 (so that Za/g = 1645), we have 1.645SE = 1.645 % < 0.04. We must substitute an approximate value of 30 into this inequality. Usually, we try p = 0.5. Then _ 1.645 <. 0.04 or (1 645) (0 5) fl > 0-04———- I 20.56 19 or n > (20.56)2 = 422.7,. Therefore, the producers must include at least 423 wholesalers in their survey if they want to estimate the proportion p correct to within 0004,,- Example 14‘. A personnel director wishes to compare the effectiveness of two methods of training industrial employees to perform a certain assembly operation, A number of employees are to be divided into two equal groups: the first receiving training method 1 and the second training method 2. Each will perform the assembly operation, and the length of assembly time will be recorded. It is expected that the measurements for both groups will have a range of approximately 8 minutes. For the estimate of the difference in mean times to assemble to be correct within 1 minute with a probability equal to 095, how many workers must be included in each training group? Solution, In this case, B = 1 and 1 — a = 0.95 (so that za/g = 1.96). Then we have 2 2 1,96SE = ism/3 + (—73 < 1, 7L1 77,2 We assume n1 2: n2 2 n and obtain the inequality 02 02 1.96 —1 + —2 < 1,, TL TL As noted above, the variability (range) of each method of assembly is approximately the same, and hence of = 0% = 02,, Since the range, equal to 8 minutes, is approximately equal to 40, we have 40 m 8 or 0 m 2 Substituting this value for 01 and 02 in the earlier inequality, we get 22 22 1,96 - + — < 1 n n 01‘ 8 1,96\/: < 1 77, 01‘ fl < 1,96% 01’ n < (196)2 (8) = 301,733,, Therefore, each group should contain at least 71 = 31 workers.- HOMEWORK: pp334 — 335 8,68, 8,69, 8.71, 8,73, 8,75, 8,77 20 ...
View Full Document

Page1 / 20

stats 363 Chapter-8 notes - CHAPTER 8 LARGE-SAMPLE...

This preview shows document pages 1 - 20. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online