Unformatted text preview: M316 Chapter 16 Dr. Berg Inference in Practice We have looked at two procedures for statistical inferences, both about the mean of a population when the "simple conditions" are true: the data are a perfect SRS, the population has a Normal distribution, and we know the standard deviation of the population. Under these conditions, a confidence interval for the mean is x z* . n To test a hypothesis H 0 : = 0 we use the onesample z statistic: x  0 z= . / n We call these z procedures because they both start with the onesample z statistic and use the standard Normal distribution. Later we will see how to modify these procedures for use in more realistic settings. Meantime, we examine consequences of the fact that our statistical inference is based on probability theory, which is based on our simple assumptions. Where Did the Data Come From The most important requirement for any inference procedure is that the data come from a process to which the laws of probability apply. Inference is most reliable when the data come from a probability sample or a randomized comparative experiment. Where the Data Come From Matters When you use statistical inference, you are acting as if your data are a probability sample or come from a randomized experiment. Statistical confidence intervals and tests cannot remedy basic flaws in producing the data, such as voluntary response samples or uncontrolled experiments. If your data don't come from a probability sample or a randomized comparative experiment, your conclusions may be challenged. To answer such a challenge, you must usually rely on subjectmatter knowledge, not on statistics. It is common to apply statistics to data that are not produced by random selection. Example (16.1) The Psychologist and the Sociologist A psychologist is interested in how our visual perception can be fooled by optical illusions. Her subjects are students in Psychology 101 at her university. Most psychologists would agree that it's safe to treat the students as an SRS of all people 1 M316 Chapter 16 Dr. Berg with normal vision. There is nothing special about being a student that changes visual perception. A sociologist at the same university uses students in Sociology 101 to examine attitudes toward poor people and antipoverty programs. Students as a group are younger than the adult population as a whole. Even among young people, students as a group come from more prosperous and bettereducated homes. Even among students, this university isn't typical of all campuses. Even on this campus, students in a sociology course may have opinions that are quite different from those of engineering students. The sociologist can't reasonably act as if these students are a random sample from any interesting population. Example (16.2) Mammary Artery Ligation Angina is the severe pain caused by inadequate blood supply to the heart. Perhaps we can relieve angina by tying off the mammary arteries to force the body to develop other routes to supply blood to the heart. Surgeons tried this procedure, called "mammary artery ligation." Patients reported a statistically significant reduction in angina pain. Statistical significance says that something other than chance is at work, but it does not say what that something is. The mammary artery ligation experiment was uncontrolled, so that the reduction in pain might be nothing more than the placebo effect. Sure enough, a randomized comparative experiment showed that ligation was no more effective than a placebo. Surgeons abandoned the operation at once. Cautions About the z Procedures Any confidence interval or significance test can only be used under specific conditions. It's up to you to understand these conditions and judge whether they fit your problem. Let's look again at the "simple conditions" for the z confidence interval and test. 1 The data must be an SRS from the population. In some cases an attempt to choose an SRS can be frustrated by nonresponse and other practical problems. There are man settings in which we don't have an actual random sample but the data can nonetheless be thought of as observations taken at random from a population. Biologists regard the 18 newts in Example 14.3 as if they were randomly chosen from all newts of the same variety The status of data as roughly an SRS from an interesting population is often not clear. Subjects in medical studies, for example, are most often patients at one or several medical centers. This is a kind of convenience sample. We may hesitate to regard these patients as an SRS from all patients everywhere with the same medical condition. Yet it isn't possible to actually choose an SRS, and a randomized clinical trial with real patients surely gives useful information. When an actual SRS is not possible, results are tentative. It is wise to wait until several studies produce similar results before coming to a conclusion. 2 M316 Chapter 16 Dr. Berg 2 Different methods are needed for different designs. The z procedures are not correct for probability samples more complex than an SRS. Later chapters give methods for some other designs, but we won't discuss inference for really complex settings. 3 Outliers can distort the results. Because x is strongly influenced by a few extreme observations, outliers can have a large effect on the z confidence interval and test. Always explore your data before doing inference. In particular, you should search for outliers and try to correct them or justify their removal before performing the z procedures. 4 The shape of the population distribution matters. Our "simple conditions" state that the population distribution is Normal. Outliers or extreme skewness make the z procedures untrustworthy unless the sample is large. Other violations of Normality are often not critical in practice. The z procedures use Normality of the sample mean x , not the Normality of the individual observations. The central limit theorem tells us that x is more normal than the individual observations. 5 You must know the standard deviation of the population. This condition is rarely satisfied in practice. Be will see in chapter 18 that simple changes give very useful procedures that do not require that be known. When the sample is very large, the sample standard deviation s is very close to . Even in this situation, it is better to use the procedures of chapter 18. Exercise (16.2) Running Red Lights A survey of licensed drivers inquired about running red lights. One question asked, "Of every ten motorists who run a red light, about how many do you think will be caught?" The mean result for 880 respondents was x = 1.92 and the standard deviation was s = 1.83. For this large sample, s will be close to the population standard deviation , so suppose we know that = 1.83. a) Give a 95% confidence interval for the mean opinion in the population of all licensed drivers. b) The distribution of responses is skewed to the right rather than Normal. This will not strongly affect the z confidence interval for this sample. Why not? c) The 880 respondents are an SRS from completed calls among 45,956 calls to randomly chosen residential telephone numbers listed in telephone directories. Only 5029 of the calls were completed. This information gives two reasons to suspect that the sample may not represent all licensed drivers. What are these reasons? Cautions About Confidence Intervals The most important caution about confidence intervals in general is a consequence of the use of a sampling distribution. A sampling distribution shows how a statistic such as x varies in repeated sampling. This variation causes "random sampling error" because the statistic misses the true parameter by a random 3 M316 Chapter 16 Dr. Berg amount. The margin of error in a confidence interval ignores everything except the sample variation due to choosing the sample randomly. The Margin of Error Doesn't Cover All Errors The margin of error in a confidence interval covers only random sampling errors. Practical difficulties such as undercoverage and nonresponse are often more serious than random sampling error. The margin of error does not take such difficulties into account. Exercise (16.4) Holiday Spending "How much do you plan to spend for gifts this holiday season?" An interviewer asks this question of 250 customers at a large shopping mall. The sample mean and standard deviation of the responses are x = $237 and s = $65 . a) The distribution of spending is skewed, but we can act as though x is Normal. Why? b) For this large sample, we can act as if = $65 because the sample s will be close to . Use this to give a 99% confidence interval for the mean gift spending of all adults. c) This confidence interval can't be used to give information about the spending plans of all adults. Why? Cautions About Significance Tests Significance tests are widely used in reporting the results of research in many fields of applied science and in industry. New pharmaceuticals require significant evidence of effectiveness and safety. Marketers want to know whether a new ad campaign significantly outperforms the old one. Etc. The reasoning of tests is less straightforward than the reasoning of confidence intervals, and the cautions needed are more elaborate. Here are some points to keep in mind. HOW SMALL A P IS CONVINCING? The purpose of a test of significance is to describe the degree of evidence provided by the sample against the null hypothesis. The Pvalue does this. But how small a Pvalue is convincing evidence against the null hypothesis? This depends mainly on two circumstances: 1 How plausible is H 0 ? If H 0 represents an assumption that the people you must convince have believed for years, strong evidence (small P) will be needed to persuade them. 2 What are the consequences of rejecting H 0 ? If rejecting H 0 in favor of H a means making an expensive changeover from one type of product packaging to another, you need strong evidence that the new packaging will boost sales. 4 M316 Chapter 16 Dr. Berg These criteria are a bit subjective. Different people will often insist on different levels of significance. Giving the Pvalue allows each of us to decide individually if the evidence is sufficiently strong. Exercise (16.5) Is It Significant? In the absence of special preparation SAT mathematics (SATM) scores in recent years have varied Normally with mean = 518 and standard deviation = 114 . Fifty students go through a rigorous training program designed to raise their SATM scores by improving their mathematical skills. Carry out a test of H 0 : = 518 versus H a : > 518 in each of these situations with = 0.05 : x = 544 . a) x = 545 . b) STATISTICAL SIGNIFICANCE AND PRACTICAL SIGNIFICANCE When a null hypothesis ("no effect" or "no difference") can be rejected at the usual levels, = 0.05 or = 0.01, there is good evidence that an effect is present; but that effect may be very small. When large sample are available, even tiny deviations from the null hypothesis will be significant. Example (16.3) It's Significant. So What? We are testing the hypothesis of no correlation between two variables. With 1000 observations, an observed correlation of only r = 0.08 is significant at the 1% level. The small Pvalue does not mean that there is a strong association, only that there is strong evidence of some association. On the other hand, if we have only 10 observations, a correlation of r = 0.5 is not significantly greater than zero even at the 5% level. Sample Size Affects Statistical Significance Because large random samples have small chance variation, very small population effects can be highly significant if the sample is large. Because small random sample have a lot of chance variation, even large population effects can fail to be significant if the sample is small. Statistical significance does not tell us whether an effect is large enough to be important. Statistical significance is not the same as practical significance. It is a good idea to give a confidence interval for the parameter in which you are interested. A confidence interval actually estimates the size of an effect. Exercise (16.7) Acid Rain Emissions of sulfur dioxide by industry set off chemical changes in the atmosphere that result in "acid rain." The acidity is measured in pH ranging from 0 to 14. Distilled water has pH 7.0, and lower pH indicates acidity. Acid rain is defined as rainfall with a pH below 5.0. Suppose that the acidity of rain is measured on different days in a Canadian forest, and that the distribution of values is Normal 5 M316 Chapter 16 Dr. Berg with = 0.5 . Given that x = 4.8 , test the proposition that the rain is acidic and give a confidence interval for: a) n=5 b) n=15 c) n=40. BEWARE OF MULTIPLE ANALYSIS Statistical significance ought to mean that you have found an effect that you were looking for. The reasoning behind statistical significance works if you decide what effect you are seeking, design a study to search for it, and use a test of significance to weigh the evidence you get. In other settings, significance may have little meaning. Example (16.4) Cell Phones and Brain Cancer Might the radiation from cell phones be harmful to users? Many studies have found little or no connection between using cell phones and various illnesses. Here is a part of a news account of one study: "A hospital study that compared brain cancer patients and a similar group without brain cancer found no statistically significant association between cell phone use and a group of brain cancers known as gliomas. But when 20 types of glioma were considered separately an association was found between phone use and one rare form. Puzzlingly, however, this risk appeared to decrease rather than increase with greater mobile phone use." Consider this: Were the 20 null hypotheses for these 20 significance tests are all true, with a significance level = 0.05 , there would be a 5% chance for each test to falsely rejecting the null hypothesis. Running one test and reaching the 5% level of significance is reasonably good evidence that you have found something. Running 20 tests and reaching that level only once is not. Exercise (16.10) Searching for ESP A researcher looking for evidence of extrasensory perception (ESP) tests 500 subjects. Four of these subjects do significantly better (P<0.01) than random guessing. a) Is it proper to conclude that these four people have ESP? b) What should the researcher now do to test whether any of these four subjects have ESP? The Power of a Test The power of a test is the probability that a fixed level significance test will reject the null hypothesis when a particular alternative value of the parameter is true. This is useful for ensuring that a difference is not just significant but also large enough to be important. 6 M316 Type I and Type II Errors Chapter 16 Dr. Berg A type I error occurs when the null hypothesis is rejected when in fact it is true. A type II error occurs when we fail to reject the null hypothesis when in fact the alternative hypothesis is true. The significance level of any fixed level test is the probability of a type I error. The power of a test against any particular alternative is 1 minus the probability of a type II error for that alternative. 7 ...
View
Full Document
 Fall '08
 BLOCKNACK
 Statistics, Normal Distribution, Statistical hypothesis testing, Dr. Berg

Click to edit the document details