This preview shows pages 1–11. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Chapter 7
Sample Variability Finally getting to the exciting stuff
So far... collecting data simple descriptions of sample data basic concepts of probability Now: final steps to turn data into useful information making population statements based on sample data! 4 Key Concepts
Random Sampling Sampling Error Sampling Distribution of Sample Means Central Limit Theorem Key Concept #1: Random Sampling
Already discussed way back when.... Key point when I say a sample was collected randomly, implication is:
all experimental units equally likely to be selected sample represent the entire population experimental units are independently selected experimental units selected without bias Key Concept #2: Sampling Error
In this statistical context `Error' is not a mistake The sampling error is an estimate of how much the sample value is different from the population value When you collect a sample from a population, do you expect = x ? No! But that doesn't mean the sample is bad just means chance influences the experimental units actually selected Sampling Error
Populations are large with a sample, you only look at a small subset of population Theoretically an infinite number of samples could be collected Think about it for a moment taking sample after sample after sample
Population Sampl e Theoretically
Population
Sample x1 1 Sample x 2 2 Sample x3 3 Sample x 4 4 Sample x 5 5 Sample x6 6 Could take an infinite number of samples from a population Many more samples Will the sample means be identical? NO! Population = 23.4 sample mean 22.6 23.4 19.9 26.2 23.5 23.8 Many more samples and many more sample means Sampling Error
So samples from the same population can have different sample means These differences are due to chance (since you sampled RANDOMLY) Sampling Error is the difference between a sample statistic and a population parameter due to chance Remember...
Probably take 1 sample Never really know the true population value Don't know the exact sampling error for you study BUT we can make a guess by thinking about all possible samples that could be taken from a population Key Concept #3: The Sampling Distribution of Sample Means
The Sampling Distribution of Sample Means occurs when you
DO take every possible sample from a population, calculate the mean of each sample and plot all the means Remember: we are theoretical here... Theoretical Example
Example: Consider the data set {1, 2, 3, 4}:
1. Make a list of all samples of size 2 that can be drawn from this set (Sample with replacement) 2) Construct the sampling distribution of sample means Table of All Possible Samples Table lists all possible samples of size 2, the mean for each sample, and the probability of each sample occurring (all equally likely) More than 1 way to get some mean values 3 different ways to get a value of 2.0 Sampling Distribution
If we combine probabilities of different means: Sampling Distribution of the Sample Mean Histogram: Sampling Distribution of the Sample Means Back to Sampling Error
The spread to the data is an estimate of sampling error
Histogram: Sampling Distribution of the Sample Means Empirical Example
http://www.ruf.rice. edu/~lane/stat_sim/sampling_dist/ Simulates random sampling of population Shows Population Shows sample collection Then calculates sample mean and adds value to the sampling distribution of sample means Sampling Distribution of Sample Means
it's a mouthful sampling distribution implies that it is the result of repeated sampling (abstract concept since not usually done) distribution of sample means just implies that it is a distribution of values (like any other we've done) and this one is of sample means Now we can ask what that distribution might look like For every sample, calculate a mean Due to sampling error, the sample mean probably not exactly equal to the population mean Sometimes it will be a little larger, sometimes a little smaller Let's go back to the empirical demonstration http://www.ruf.rice. edu/~lane/stat_sim/sampling_dist/ If we take a lot of samples from a normally distributed population with a mean of 16...what do you notice about the sampling distribution of sample means? Turns out statisticians know a lot about this relationship
Note: Simulation uses N to signify n Sampling Distribution of Sample Means
will have a mean
called the mean of the sampling distribution of sample means or x will have a standard deviation
called the standard deviation of the sampling distribution of sample means or Standard Error = x
x x Sampling Distribution of Sample Means
http://www.ruf.rice. edu/~lane/stat_sim/sampling_dist/ What do you notice? With a lot of samples
x = x is smaller than (the distribution less spread out x = As the n increases, the spread to the distribution of sample means (x) decreases
Go back to the simulation and compare samples of size 5 to samples of size 20 and see how the spread to the SDSM changes Has the distribution of observations in the Population changed with sample size? NO Example so far population is normally distributed What if the population is not normally distributed?
Let's go back to the simulation and see what happens to the SDSM when the population is nonnormal Key Concept #4: Central Limit Theorem
Central Limit Theorem: the sampling distribution of means will approach a normal distribution as sample size increases
sample size is n  the size of each individual sample (not the number of samples) for most data sets, SDSM will look normal if n is 30 or more Question: IF the population is normally YES distributed, will the SDSM always be Normally distributed population sampling distributions also normally distributed Nonnormally distributed population sampling distribution approaches normality as n increases What is the standard deviation of the distribution of sampling means?
x= an estimate of the spread of the sampling distribution of means (and estimate of sampling error) What does it mean? Standard Error
Think about it... The SE represents you going out and actually taking every possible sample and then calculating the spread to the distribution of sample means In the REAL world, you only take 1 sample In future lecture we will get to how we relate this 1 sample to all possible samples to save time and make inference Example: Groundwater samples from Wye Island
Value shown is the sample mean + 1 standard error an # animals/liter Near Shore In Trees When presenting sample means always show + and/or 1 standard error on each bar or point What does the standard error But lets think about the Standard Error
x It is influenced by sample size
as n increases, the SE decreases Why should this be?
You have more information, so your sample mean should fall closer to the population mean less impact of extreme values Example
Lets say you have a population with a mean of 50 and values between 0 and 100. Just for this example, assume all the elements you draw in a sample are equal to a value of 50 EXCEPT for one value of 100 allows us to look at the impact of extreme values on the mean As n increases, the impact of an extreme value decreases and the sample mean tends to be nearer in value to the population mean As a rule: as n increases, it is more likely your sample mean will be close to the true population mean
Range of possible values for the mean 2 Sample 100 Thus as n increases, the standard error decreases
Standard error
25.0 16.7 12.5 10.0 5.0 0.5 Remember...
samples of small size aren't BAD samples they just have less information than large samples this implies that large samples more likely to have sample value close to true population value Important comments
This material is important, but often difficult to understand because it is theoretical Students often tell me that they `get it' when, in fact, they don't. The 4 key concepts (random sampling, sampling error, SDSM, and CLT) are fundamental to what we do next. Spend the time you need to fully grasp this material Application
Given that the sampling distribution of means is normally distributed we can calculate the probability of a sample mean using Table 3 and the z score equation Remember, before we used the z score equation to find the probability of an observation x z = Now we are calculating the probability of a sample mean, so have to use the sample mean and the standard deviation for the sampling distribution of means = SE = x
z = x n Example: Consider a normal population with = 50 and = 15. Suppose a sample of size 9 is selected at random. Find: P(45< x <60)? ASKS FOR THE PROBABILITY OF A MEAN! Solution: Can use the probability approach with normal distributions since population is normally distributed. 1) 2)
x 50 n 15 9 15 3 5 x 45 1.00 50 0 60 2.00 x z = x n ; 45 50 < 60 50 z< P (45 < x < 60) P 5 5 P ( 1.00 < z < 2.00) 0.3413 0.4772 0.8185 Old Exam Questions
Suppose for the U.S. population, the average size of an infant at birth is 96 ounces, with a standard deviation of 16 ounces. Assume birthweights are normally distributed. You go to an inner city hospital and examine the records for 16 infants born on the previous day. = 96, = 16 What is the probability that the mean of all 16 infants will fall below 90 ounces?
P(x < 90)? ? 90 area for z of 1.5 is 0.4332 P (mean<90) = 0.50.4332=0.0668 What if Population is NOT normally distributed?
To calculate the probability of a mean you can still use the z equation and the areas associated with standard normal curve IF the SDSM is normally distributed When does that occur? What should you do?
Skillbuilder Applets: 7.17, 7.18, 7.68 Simulation at http://www.ruf.rice.
edu/~lane/stat_sim/sampling_dist/ Chapter Practice Test Parts I and II ...
View
Full
Document
This note was uploaded on 04/28/2009 for the course BIOM 301 taught by Professor Staff during the Spring '08 term at Maryland.
 Spring '08
 staff

Click to edit the document details