M316 Chapter 11 - M316 Chapter 11 Dr. Berg Sampling...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: M316 Chapter 11 Dr. Berg Sampling Distributions How much on the average do American households earn? The government's Current Population Survey contacted a sample of 113,146 households in March 2005. Their mean income in 2004 was x = $60,528 . That $60,528 describes the sample, but we use it to estimate the mean income of all households. This is an example of statistical inference: we use information from the sample to infer something about a wider population. We cannot expect this number to be absolutely correct. The question we address in this chapter is "How good can we expect this estimate to be"? Parameters and Statistics Since we are using samples to make inferences about a population, we must be careful to distinguish to which of these a number applies. Definition A parameter is a number that describes the population. In statistical practice, the value of a parameter is not known because we cannot examine the entire population. A statistic is a number that can be computed from the sample data without making use of any unknown parameters. In practice, we often use a statistic to estimate an unknown parameter. The mean of a population is a parameter, and the sample mean x is a statistic used to estimate . The sample mean will differ from one sample to another, but in most cases the difference will be small. Exercise (11.2) Indianapolis Voters Voter registration records show that 68% of all registered voters in Indianapolis are republican. A random digit dialing device calls 150 residential telephones in Indianapolis. Of the registered voters contacted, 73% are republican. Which of these numbers is a parameter and which is a statistic? Statistical Estimation and the Law of Large Numbers Because good samples are chosen randomly, statistics such as x are random variables. We can describe the behavior of a sample statistic by a probability model. An important question is "How good is our estimate?" This question is easier to answer in the form "How often would this method give us a reasonable estimate?" 1 M316 Chapter 11 Dr. Berg Before we answer this question, we answer the question "How can we make our estimate more accurate?" The Low of Large Numbers As the number of randomly drawn observations increases, the sample mean x tends to get closer to the population mean . In a nutshell, this means that bigger samples tend to be more accurate. Let's look at an example. Example (11.2) Does This Wine Smell Bad Sulfur compounds such as dimethyl sulfide (DMS) are sometimes present in wine. DMS causes "offodor" in wine, so winemakers want to know the odor threshold, the lowest concentration of DMS that the human nose can detect. Different people have different thresholds, so we start by asking about the mean threshold in the population of all adults. The number is a parameter that describes the population. To estimate , we present tasters with both natural wine and the same wine spiked with DMS at different concentrations to find the lowest concentration at which they identify the spiked wine. Here are the odor thresholds (measured in micrograms of DMS per liter of wine) for 10 randomly chosen subjects: 28 40 28 33 20 31 29 27 17 21 The mean threshold for these subjects is x = 27.4 . It seems reasonable to use the sample result x = 27.4 to estimate the unknown . An SRS should fairly represent the population, so the mean x of the sample should be somewhere near the mean . Of course another SRS would likely have a different mean. Example (11.3) The Law of Large Numbers in Action The distribution of odor thresholds among all adults has mean 25. The mean = 25 is the true value of the parameter we seek to estimate. We present a graph of the mean threshold for DMS for increasingly large samples. The first subject in example 11.2 had a threshold of 28, so the graph starts there. The mean for the first two subjects is 28 + 40 x= = 34 . 2 This is the second point on the graph. At first, the graph of the sample mean changes as we take more observations. Eventually, however, the mean of the observations gets close to the population mean = 25 and settles near that value. Were we to start over again choosing people at random from the population, we would get a different graph, but the longterm behavior would be the same. The law of large numbers predicts this. 2 M316 Chapter 11 Dr. Berg Exercise When rolling 6 fair dice, we expect about half to show odd numbers and the other half to show even numbers. Roll 6 dice ten times and record the number of odd outcomes. Calculate the proportion for the first 6, then 12, then 18, and so on. Sampling Distributions Since sampling is a random phenomenon, any statistic calculated from the sample has a probability distribution. We are interested in the shape, mean, and standard deviation of this distribution. We can use software to imitate the taking of many samples in what is called simulation. Example (11.4) What Would Happen in Many Samples? Extensive studies have found that the DMS odor threshold of adults follows roughly a Normal distribution with mean = 25 micrograms per liter and standard deviation = 7 micrograms per liter. With this information, we can simulate many repetitions of example 11.2 with different subjects drawn at random from the population. The following illustrates a simulation of many samples. 3 M316 Chapter 11 Dr. Berg This histogram show the results of 1000 samples. Definition The sampling distribution of a statistic is the distribution of values taken by the statistic in all the possible samples of the same size from the same population. In the previous example, the histogram looks approximately normal with mean 24.95 and standard deviation 2.217, which is much smaller than the standard deviation = 7 of the population. Exercise Make a histogram of the ten samples from the exercise involving rolling 6 dice and recording the number of odd outcomes. The Sampling Distribution of x As we have seen, when we choose many SRSs from a population, the sampling distribution of the sampling means is centered on the mean of the population, but has much less spread than the original population. Mean and Standard Deviation of a Sample Mean Suppose that x is the mean of an SRS of size n drawn from a large population with mean and standard deviation . Then the sampling distribution of x has mean and standard deviation / n . These facts hold for the sample mean regardless of the nature of the distribution of the population, not just for normal distributions. Both of these facts have important implications for statistical inference. 4 M316 Chapter 11 Dr. Berg 1 Because the mean of the statistic x is equal to the mean of the population, x is what we call and unbiased estimator of the parameter . 2 Since the standard deviation of the sample distribution is / n , averages are less variable than individual observations. In particular, the results of large samples are less variable than the results of small samples. Sampling Distribution of a Sample Mean If individual observations have the N(, ) distribution, then the sample mean x has the N(, / n ) distribution. Example (11.5) Population Versus Sampling Distribution If we measure the DMS odor thresholds of individual adults, the values follow the Normal distribution with mean = 25 micrograms per liter and standard deviation = 7 micrograms per liter. We call this the population distribution because it shows how measurements vary within the population. Take many SRSs of size 10 from this population and find the sample mean for each sample. The sampling distribution describes how values of x vary among samples. The sampling distribution is also normal with mean = 25 and standard 7 deviation = = 2.2136 . The above figure contrasts these distributions. n 10 5 M316 Chapter 11 Dr. Berg Note that to halve the standard deviation, the sample size n must increase fourfold. Exercise (11.9) National Math Scores The scores of 12thgrade students on the National Assessment of Educational Progress year 2000 mathematics test have a distribution that is approximately Normal with mean = 300 and standard deviation = 35 . a) Choose one 12thgrader at random. What is the probability that his or her score is higher than 300? Higher than 335? b) Now choose and SRS of four 12thgraders and calculate their mean score x . If you did this many times, what would be the mean and standard deviation of all the x values? c) What is the probability that the mean score for your SRS is higher than 300? Higher than 335? The Central Limit Theorem It is a remarkable fact that as the sample size increases, the distribution of x changes shape: it looks less like that of the population and more like a Normal distribution. Central Limit Theorem Draw an SRS of size n from any population with mean and standard deviation . Then for large enough n, the sampling distribution of x is approximately N(, / n ). This is why the Normal distribution is so important in statistics. More generally, any variable that is a sum of many small influences (even with different distributions) is approximately Normal. Example (11.6) The Central Limit Theorem in Action In March 2004, the Current Population Survey contacted 98,789 households. The distribution is strongly skewed to the right. Figure (a) shows a histogram of the 62,101 households that had earned income greater than zero in 2003. The right tale is longer than shown because the bars are too short to show up in the histogram. The mean earnings of these 62,101 households was $57,085. The distribution of SRSs of 100 households is almost a Normal distribution. Figures (b) and (c) show the distributions of 500 SRSs drawn from these 62,101 households. Figure (c) shows the distribution in greater detail. 6 M316 Chapter 11 Dr. Berg 7 M316 Chapter 11 Dr. Berg The next graphic shows the sample mean distributions from an exponential distribution with sample sizes of 1, 2, 10 and 25 observations. Exercise (11.8) Maintaining Air Conditioners State: The time (in hours) that a technician requires to perform preventive maintenance on an airconditioning unit is governed by the exponential distribution. The mean time is = 1 hours and the standard deviation is = 1 hour. Your company has a contract to maintain 70 or these units in an apartment building. You must schedule technicians' time for a visit to this building. Is it safe to budget 1.1 hours per unit or should you budget 1.25 hours? Formulate: We can treat these 70 airconditioners as an SRS of size 70 and find the probabilities of exceeding our budgeted time. Solve: The average time will be approximately Normal with mean = 1 and 1 standard deviation = = 0.12 hours. Using this Normal distribution, the 70 probabilities are P(x > 1.10 hours) = 0.2014 P(x > 1.25 hours) = 0.0182 . If you budget 1.1 hours, there is a 20% chance that the job will not be Conclude: completed in time. If you budget 1.25 hours, this drops to 2%, so you budget 1.25 hours. 8 M316 Chapter 11 Dr. Berg Here is a graphic showing the difference between the Normal approximation and the actual distribution for the previous example. 9 ...
View Full Document

Ask a homework question - tutors are online