Unformatted text preview: M316 Chapter 18 Dr. Berg Inference About a Population Mean Now we develop new procedures to replace the z procedures that we used to introduce confidence intervals and tests. We discard the unrealistic condition that we know the standard deviation of the population. The interpretation of the results remains the same as before. Conditions for Inference Here are the new “simple conditions”. Conditions for Inference About a Mean 1 We can regard our data as a simple random sample (SRS) from the population. This condition is very important. 2 Observations from the population have a Normal distribution with mean µ and standard deviation σ. In practice, it is enough that the distribution be symmetric and single‐peaked unless the sample is very small. Both µ and σ are unknown. Another condition that applies to all of the inference methods in this book is that the population be much larger than the sample, say at least 20 times as large. In practice, the sample is seldom a large part of the population. When the conditions are satisfied, the sample mean x has the Normal distribution with mean µ and standard deviation σ / n . Because we don’t know σ, we estimate it by the sample standard deviation s. We then estimate the standard deviation of x by s / n . € € Standard Error € € When the standard deviation of a statistic is estimated from data, the result is called the standard error of the statistic. The standard error of the sample mean x is s / n . Exercise (18.1) Travel Time to Work € A study of commuting times reports the travel times to work of a random sample of 20 employed adults in New York State. The mean is x = 31.25 minutes and the standard deviation is s = 21.88 minutes. What is the standard error of the mean? € € € 1 M316 The t Distributions Chapter 18 Dr. Berg When we don’t know σ, we replace σ / n the standard deviation of x with the standard error s / n . The new statistic does not have a Normal distribution. € € € The One‐Sample t Statistic and the t Distributions € Draw an SRS of size n from a large population that has the Normal distribution with mean µ and standard deviation σ. The onesample t statistic x −µ t= has the t distribution with n–1 degrees of freedom. s/ n The t statistic has the same interpretation as any standardizes statistic: it says how far x is from its mean µ in standard deviation units. There is a different t distribution for each sample size. We specify a particular t distribution by giving its degrees of freedom n–1. This comes from the denominator of s: s has n–1 degrees of freedom. We will write the t distribution with n–1 degrees of freedom as t(n–1) for € short. Here is a comparison of two t distributions with the standard Normal. • The figure illustrates these facts about the t distributions: The density curves of the t distributions are similar in shape to the standard Normal curve. They are symmetric about 0, single‐peaked, and bell‐shaped. 2 M316 • Chapter 18 Dr. Berg • The spread of the t distributions have more probability in the tails and less in the center than does the standard Normal. This is true because substituting s for the fixed parameter σ introduces more variation into the statistic. As the degrees of freedom increase, the t density curve approaches the N(0,1) curve ever more closely. This happens because s estimates σ more closely as the sample size increases. Thus, using s in place of σ causes little extra variation when the sample is large. Table C in the back of the book gives critical values for the t distributions. Each row in the table contains critical values for the t distributions whose degrees of freedom appear at the end of the row in the left column. For convenience, we label the table entries both by the confidence level C (in percent) required for confidence intervals and by the one‐sided and two‐sided P‐values for each critical value. You have already used the standard Normal critical values z* in the bottom row. You can see from looking down each column that the t critical values approach the Normal values as the degrees of freedom increase. Example (18.1) t Critical Values One of the t distributions shown in the previous illustration is for 9 degrees of freedom. For this distribution, what point has probability 0.05 to the right? In Table C, look in the df=9 row above the one‐sided P‐value 0.05 and you will find that the critical value is t*=1.833. To use software, enter the degrees of freedom and the probability you want to the left, 0.95 in this case. Exercise (18.3) Critical Values Use Table C to find: a) the critical value for a one‐sided test with level α = 0.05 based on the t(5) distribution. b) The critical value for a 98% confidence interval based on the t(21) distribution. € Exercise (18.4) More Critical Values You have an SRS of size 25 and calculate the one‐sample t statistic. What is the critical value t* such that: a) t has probability 0.025 to the right of t*? b) t has probability 0.75 to the left of t*? The One‐Sample t Confidence Interval To analyze a sample from Normal populations with unknown σ, just replace the standard deviation σ / n of x by its standard error s / n in the z procedures of chapter 14 and 15. The confidence interval and test that result are onesample t procedures. Critical values and P‐values come from the t distribution with n–1 degrees of freedom. € € € 3 M316 Chapter 18 Dr. Berg The One‐Sample t Confidence Interval Draw an SRS of size n from a large population having unknown mean µ. A level C confidence interval for µ is s x± t* n where t* is the critical value for the t(n–1) density curve with area C between –t* and t*. This interval is exact when the population distribution is Normal and is approximately correct for large n in other cases. € Example (18.2) Healing of Skin Wounds We revisit the biological study of Example 14.3. We follow the four‐step process for a confidence interval, outlined on page 350. STATE: Biologists studying the healing of skin wounds measured the rate at which new cells closed a razor cut made in the skin of an anesthetized newt. Here are the data from 18 newts, measured in micrometers (millionths of a meter) per hour. 29 27 34 40 22 28 14 35 26 35 12 30 23 18 11 22 23 33 This is one of several sets of measurements made under different conditions. We want to estimate the mean healing rate for comparison with rates under other conditions. FORMULATE: We will estimate the mean rate µ for all newts of this species by giving a 95% confidence interval. SOLVE: We must first check the conditions for inference. • As in Chapter 14, we are willing to regard these newts as an SRS from their species. • The stemplot in the next figure does not suggest any strong departures from normality. For these data x = 25.67 and s = 8.324 . The degrees of freedom are 18 − 1 = 17 . From Table C we find that for the 95% confidence interval t*=2.110. The confidence interval is s 8.324 x = 25.67 ± 2.110 = 25.67€ 4.14 . ± € €± t * n 18 € 4 M316 Chapter 18 Dr. Berg CONCLUDE: We are 95% confident that the mean healing rate for all newts of this species is between 21.53 and 29.81 micrometers per hour. The one‐sample t confidence interval has the form estimate ± t * SE estimate where “SE” stands for “standard error.” In Example 18.2, the estimate is the sample mean x , and its standard error is s 8.324 SE x = = = 1.962 . € n 18 € Exercise (18.6) The Conductivity of Glass How well materials conduct heat matters when designing houses, for € example. Conductivity is measured in terms of watts of heat power transmitted per square meter of surface per degree Celsius of temperature difference on the two sides of the material. In these units, glass has a conductivity of about 1. The National Institute of Standards and Technology provides exact data on properties of materials. Here are 11 measurements of the heat conductivity of a particular type of glass: 1.11 1.07 1.11 1.07 1.12 1.08 1.08 1.18 1.18 1.18 1.12 a) Make a stemplot. Is there any sign of major deviation from Normality? b) Give a 95% confidence interval for the mean conductivity. c) Use the interval to do a test: is there significant evidence at the 5% level that the mean conductivity of this type of glass is not 1? The One‐Sample t Test Like the confidence interval, the t test is very similar to the z test we used earlier. The One‐Sample t Test Draw an SRS of size n from a large population having unknown mean µ. To test the hypothesis H0: µ =µ 0, compute the onesample t statistic x − µ0 t= . s/ n In terms of the variable T having the t(n–1) distribution, the P‐value for a test of H0 against H 0 : µ > µ0 is P (T ≥ t ) € H 0 : µ < µ0 is P (T ≤ t ) H 0 : µ ≠ µ0 is P (T ≥ t ) These P‐values are exact if the population distribution is Normal and are € € approximately correct for large n in other cases. € € € € 5 M316 Chapter 18 Dr. Berg Example (18.3) Sweetening Cola Here is a more realistic analysis of the cola sweetening example from Chapter 15. We follow the four‐step process for a significance test. STATE: Cola makers test new recipes for loss of sweetness during storage. Trained tasters rate the sweetness before and after storage, Here are the sweetness losses found by 10 taster for a new recipe. 2.0 0.4 0.7 2.0 –0.4 2.2 –1.3 1.2 1.1 2.3 Are these data good evidence that the cola lost sweetness? FORMULATE: Tasters vary in their perception of sweetness loss. So we ask the question in terms of mean loss µ for a large population of tasters. The null hypothesis is “no loss,” and the alternative hypothesis says “there is a loss.” Thus H 0 : µ = 0 and H a : µ > 0 . SOLVE: First check the conditions for inference. As before, we are willing to regard these 10 tasters as an SRS from a large population of trained tasters. We can’t judge Normality from just 10 tasters. There are no outliers but the data is somewhat € € skewed. P–values for the t test may be only approximately accurate. The basic statistics are x = 1.02 and s = 1.196 . The one‐sample t statistic is x − µ0 1.02 − 0 t= = = 2.696 . s / n 1.196 / 10 The P‐value is the area to the right of 2.696 under the curve of the distribution with € € df=10–1=9. Software gives P=0.0123. Without software, we can use Table C to place P between 0.02 and 0.01. € CONCLUDE: There is strong evidence for a loss of sweetness. Exercise (18.8) Is It Significant? The one‐sample t statistic for testing H 0 : µ = 0 versus H a : µ > 0 is t=1.82 from a sample of 15 observations. a) What are the degrees of freedom? b) Give the two critical values t* from Table C that bracket t. What are the one‐ € € sided P‐values for these two entries? c) Is the result significant at the 5% level? At the 1% level? Using Technology Any software suitable for statistics will implement the one‐sample t procedures. You can read and use almost any output now that you know what to look for. The next figures display the output for the 95% confidence interval of Example 18.2 from a graphing calculator, two statistical programs, and a spreadsheet program. 6 M316 Chapter 18 Dr. Berg Matched Pairs t Procedures The study of healing in Example 18.2 estimated the mean healing rate for newts under natural conditions, but the researchers then compared results under several conditions. The taste test in Example 18.3 was a matched pairs study in which the same 10 tasters rated before‐and‐after sweetness. Comparative studies 7 M316 Chapter 18 Dr. Berg are more convincing than single‐sample investigations. For that reason, one‐sample inference is less common than comparative inference. One common design to compare two treatments makes use of one‐sample procedures. In a matched pairs design, subjects are matched in pairs and each treatment is given to one subject in each pair. Another situation calling for matched pairs is before‐and‐after observations on the same subjects, an in the taste test of Example 18.3. Matched Pairs t Procedures To compare the responses to the two treatments in a matched pairs design, find the difference between the responses within each pair. Then apply the one‐ sample t procedures to these differences. Robustness of t Procedures The t confidence interval or significance test are exactly correct when the distribution of the population is exactly Normal. No real data are perfectly Normal. The usefulness of the t procedures therefore depends on how strongly they are affected by lack of Normality. Robust Procedures A confidence interval or significance test is called robust if the confidence interval or P‐value does not change very much when the conditions for use of the procedure are violated. The condition that the population be Normal rules out outliers, so the presence of outliers show that this condition is not fulfilled. The t procedures are not robust against outliers unless the sample is very large, because x and s are not resistant to outliers. Fortunately, the t procedures are quite robust against non‐Normality of the population except when outliers or strong skewness are present. As the sample size € increases, the central limit theorem ensures that the t procedures will become more accurate. Using the t Procedures 1 Except in the case of small samples, the condition that the data are an SRS from the population of interest is more important than the condition that the population distribution is Normal. 2 Sample size less than 15: Use t procedures if the data appear close to Normal. 3 Sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness. 4 Large samples: The t procedures can be used even for skewed distributions when n≥40. 8 M316 Chapter 18 Dr. Berg Example (18.5) Can We Use t ? Can we use t procedures for these data? a) This is a histogram of the percent of each state’s adult residents who are college graduates (the entire population). b) This is a stemplot of the force required to pull apart wood. c) This is a stemplot of the lengths of 23 specimens of a tropical flower. d) This is a histogram of the heights of students in a college class. Answers: a) No. This is the entire population, so no inference is needed. b) No, The data are badly skewed and n is only 20. c) Yes. The sample is large enough to overcome a little skewness. d) Yes. The sample is close to Normal. 9 ...
View Full Document
This note was uploaded on 09/02/2010 for the course BIO 325 taught by Professor Saxena during the Spring '08 term at University of Texas.
- Spring '08