Unformatted text preview: 1/31/11 PADP 8120: Data Analysis and Sta5s5cal Modeling Sta$s$cal Inference 1 Confidence Intervals Spring 2011 Angela Fer5g, Ph.D. Plan Last 5me we covered probability distribu5ons Today we will start our first foray into sta5s5cal inference confidence intervals 1 1/31/11 Recall standard errors If we have lots of samples, then: Mean of all the sample means = popula5on mean Shape of the distribu5on of sample means = normal. Standard error tells us how disperse the distribu5on of sample means is. Thus, we can work out how likely our sample mean is near the popula5on mean. We can calculate what's called a confidence interval. Confidence Interval A confidence interval for an es5mate is a range of numbers within which the parameter is likely to fall Parameter describes popula5on Es5mate comes from the sample We can use the standard error to produce such a range: estimate (z standard error) z is the confidence coefficient and is chosen to determine what is meant by "likely" to contain the actual value of the es5mate (usually close to 1, like 0.95 or 0.99) Since the sampling distribu5on is normal, we know the values of z that correspond to the probability of any propor5on that is, about 95% of confidence intervals that are 2 standard errors on either side of the sample mean will include the popula5on mean. 2 1/31/11 Example Calcula5on Sample mean = 8.5 Standard error = 2 z = 2 (95% confidence) 95% confidence interval: Graph of confidence interval Population mean = 6 Sick days Sample mean=8.5 95% confidence interval (z=2) 3 1/31/11 More samples That's just one sample. Let's imagine that we took many samples. Then, we calculated 95% confidence intervals for all of the sample means. Graph of confidence interval Population mean = 6 Sick days 4 1/31/11 Interpreta5on Of the 7 samples, all of the confidence intervals around the sample mean included the actual true popula5on mean except for one. If we took more samples, we would expect that 95% of the confidence intervals to include the actual popula5on mean. 95% because that's the confidence coefficient we picked. Exact confidence coefficients I have been rounding the numbers. The exact figures for z are: Confidence 68% 95% 99% 99.9% z 1.00 1.96 2.58 3.29 5 1/31/11 Exact confidence coefficient for small sample sizes Because we don't know the popula5on standard devia5on and must use the sample standard devia5on to get the es5mated standard error, there is error, especially when the sample size is small. To account for this error, for small n, we should use the t-distribu3on, not the normal distribu5on, to es5mate the confidence interval. The t-distribu5on has faeer tails than the normal distribu5on. There are tables that give these scores for different confidence levels and different degrees of freedom (df=n-1). The t-distribu5on looks almost exactly like the normal distribu5on for large df. t-distribu5on graph 6 1/31/11 t-distribu5on table Confidence t(df=1) 90% 95% 99% 6.31 12.71 63.66 t(df=10) 1.81 2.23 3.17 t(df=30) 1.70 2.04 2.75 t(df=100) z 1.66 1.98 2.63 1.65 1.96 2.58 Controlling the confidence interval Choose a different confidence level. If we picked 99% confidence instead, the interval would be larger. If we picked 90% confidence, the interval would be narrower, but we would be wrong more ogen. Change the sample size. The bigger the sample size, the lower the standard error and therefore the smaller the confidence interval for a given probability. 7 1/31/11 Confidence intervals for propor5ons Since calcula5ng the standard error is similar for propor5ons, so are producing confidence intervals. We need binary data coded 0/1 (yes/no, men/ women, etc.) 95% confidence interval for propor5ons is: Propor5ons example and ask them if they are currently working. We want to know the propor5on working (). 90% say they are working (P). Let's call 1000 people = P 1.96 P(1- P) n 0.90(1- 0.90) 1000 = 0.90 1.96 0.00949 = 0.90 0.02 i.e. Actual proportion is 90% 2%, so the interval is 88% to 92%. = 0.90 1.96 Note that this confidence interval ignores non-sampling error, or sampling bias due to non-response, badly worded ques5ons, etc. 8 ...
View Full Document
- Summer '11