2500M_Seury_L23_6pg qr

2500M_Seury_L23_6pg qr - Standard deviation s – standard...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Standard deviation s – standard error s/√n [7] Inference for Distributions For a sample of size n, the sample standard deviation s is: s= n − 1 is the “degrees of freedom.” 1 ∑ ( xi − x )2 n −1 The value s/√n is called the standard error of the mean SEM. Scientists often present sample results as mean ± SEM. Lecture Notes Simin Seury, Department of Economics, York University, Canada Date: Mar 30 2010 A study examined the effect of a new medication on the seated systolic blood pressure. The results, presented as mean ± SEM for 25 patients, are 113.5 ± 8.9. What is the standard deviation s of the sample data? SEM = s/√n <=> s = SEM*√n s = 8.9*√25 = 44.5 t Distribution The t distributions The t distribution is a family of similar probability distributions. A specific t distribution depends on a parameter known as the degrees of freedom. Degrees of freedom refer to the number of independent pieces of information that go into the computation of s. what is s? Suppose that an SRS of size n is drawn from an N(µ, σ) population. When σ is known, the sampling distribution is N(µ, σ/√n). When σ is estimated from the sample standard deviation s, the sampling distribution follows a t distribution t(µ , s/√n) with degrees of freedom n − 1. t= When σ is unknown The sample standard deviation s provides an estimate of the population standard deviation σ. When the sample size is large, the sample is likely to contain elements representative of the whole population. Then s is a good estimate of σ. But when the sample size is small, the sample contains only a few individuals. Then s is a mediocre estimate of σ. x −µ s n is the one-sample t statistic. When n is very large, s is a very good estimate of σ, and the corresponding t distributions are very close to the normal distribution. The t distributions become wider for smaller sample sizes, reflecting the lack of precision in estimating σ from s. Population distribution Large sample Small sample 1 Standardizing the data before using Table D The one-sample t-confidence interval As with the normal distribution, the first step is to standardize the data. The level C confidence interval is an interval with probability C of containing the true population parameter. Then we can use Table D to obtain the area under the curve. t(µ ,s/√n) df = n − 1 t= s/√n µ t(0,1 0,1) 0,1 df = n − 1 x −µ s n 1 We have a data set from a population with both µ and σ unknown. We use x to estimate µ and s to estimate σ, using a t distribution (df n−1). Practical use of t : t* C is the area between −t* and t*. 0 x t We find t* in the line of Table D for df = n−1 and confidence level C. C m The margin of error m is: m Here, µ is the mean (center) of the sampling distribution, m = t*s and the standard error of the mean s/√n is its standard deviation (width). You obtain s, the standard deviation of the sample, with your calculator. Table D n −t* t* Green Tea and Heart Attack It is sometimes claimed that drinking green tea may protect against heart attacks. Green tea contains certain component, Gx which may act on blood cholesterol, likely helping to prevent heart attacks. When σ is unknown, we use a t distribution with “n−1” degrees of freedom (df). Table D shows the z-values and t-values corresponding to landmark P-values/ confidence levels. t= To see if drinking green tea increases the average blood level of Gx, a group of nine randomly selected healthy men were assigned to drink green tea daily in the morning and in the afternoon for two weeks. The levels of Gx in their blood were assessed before and after the study, and the percent change is presented here: 0.7 3.5 4 4.9 5.5 7 7.4 8.1 8.4 x −µ s n Firstly: Are the data approximately normal? Percent change Histogram Frequency 4 3 2 1 0 When σ is known, we use the normal distribution and the standardized z-value. 2.5 5 7.5 9 More Percentage change in levels of Gx in blood Table A vs. Table D 9 8 7 6 5 4 3 2 1 0 There is a low value, but overall the data can be considered reasonably normal. -2 -1 0 1 Normal quantiles 2 What is the 95% confidence interval for the average percent change? Table A gives the area to the LEFT of hundreds of z-values. Sample average = 5.5; s = 2.517; df = n − 1 = 8 (…) It should only be used for Normal distributions. (…) The sampling distribution is a t distribution with n − 1 degrees of freedom. Table D Table D gives the area to the RIGHT of a dozen t or z-values. (…) It can be used for t distributions of a given df and for the Normal distribution. Table D also gives the middle area under a t or normal distribution comprised between the negative and positive value of t or z. For df = 8 and C = 95%, t* = 2.306. The margin of error m is: m = t*s/√n = 2.306*2.517/√9 ≈ 1.93. With 95% confidence, the population average percent increase of Gx in blood levels of healthy men drinking two cups of green tea daily is between 3.6% and 7.6%. Important: The confidence interval shows how large the increase is, but not if it can have an impact on men’s health. 2 Table D Excel Menu: Tools/DataAnalysis: select “Descriptive statistics” For df = 9 we only look into the corresponding row. PercentChange Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confidence Level(95.0%) 5.5 0.838981 5.5 #N/A 2.516943 6.335 0.010884 -0.7054 7.7 0.7 8.4 49.5 9 1.934695 s/√n The calculated value of t is 2.7. We find the 2 closest t values. m !!! Warning: do not use the function =CONFIDENCE(alpha, stdev, size) This assumes a normal sampling distribution (stdev here refers to σ) and uses z* instead of t* !!! The one-sample t-test 2.398 < t = 2.7 < 2.821 thus 0.02 > upper tail p > 0.01 For a one-sided Ha, this is the P-value (between 0.01 and 0.02); for a two-sided Ha, the P-value is doubled (between 0.02 and 0.04). [6] Introduction to Inference As in the previous chapter, a test of hypotheses requires a few steps: 1. Stating the null and alternative hypotheses (H0 versus Ha) 2. Deciding on a one-sided or two-sided test Tests of Significance Use and Abuse of Tests 3. Choosing a significance level α 4. Calculating t and its degrees of freedom Lecture Notes 5. Finding the area under the curve with Table D Simin Seury, Department of Economics, York University, Canada Date: 30 March 2010 6. Stating the P-value and interpreting the result The P-value is the probability, if H0 is true, of randomly drawing a sample like the one obtained or more extreme, in the direction of Ha. The significance level α and P-value The significance level, α, is the largest P-value tolerated for rejecting a The P-value is calculated as the corresponding area under the curve, true null hypothesis (how much evidence against H0 we require). This one-tailed or two-tailed depending on Ha: value is decided arbitrarily before conducting the test. If the P-value is equal to or less than α (P ≤ α), then we reject H0. One-sided (one-tailed) If the P-value is greater than α (P > α), then we fail to reject H0. x − µ0 t= s n Does the packaging machine need revision? Two-sided test. The P-value is 4.56%. Two-sided (two-tailed) * If α had been set to 5%, then the P-value would be significant. * If α had been set to 1%, then the P-value would not be significant. 3 Critical Value Approach to One-Tailed Hypothesis Testing Steps of Hypothesis Testing The test statistic z has a standard normal probability distribution. We can use the standard normal probability distribution table to find the z-value with an area of a in the lower (or upper) tail of the distribution. Step 1. Develop the null and alternative hypotheses. Step 2. Specify the level of significance . Step 3. Collect the sample data and compute the test statistic. The value of the test statistic that established the boundary of the rejection region is called the critical value for the test. p-Value Approach The rejection rule is: Lower tail: Reject H0 if z < -zα • Upper tail: Reject H0 if z > zα Step 5. Reject H0 if p-value < α • Lower-Tailed Test About a Population Mean: σ Known Critical Value Approach Sampling distribution of z = x − µ σ/ n Reject H0 α = .10 Step 4. Use the value of the test statistic to compute the p-value. Steps of Hypothesis Testing Critical Value Approach Step 4. Use the level of significance to determine the critical value and the rejection rule. Step 5. Use the value of the test statistic and the rejection rule to determine whether to reject H0. Do Not Reject H0 z −zα = −1.28 0 Upper-Tailed Test About a Population Mean: σ Known One-Tailed Tests About a Population Mean: σ Known Example: Critical Value Approach The response times for a random sample of 40 medical emergencies were tabulated. The sample mean is 13.25 minutes. The population standard deviation is believed to be 3.2 minutes. Sampling distribution of z = x − µ σ/ n Reject H0 Do Not Reject H0 α = .05 z 0 The director of Health Service wants to perform a hypothesis test, with a .05 level of significance, to determine whether the service goal of 12 minutes or less is being achieved. zα = 1.645 4 One-Tailed Tests About a Population Mean: s Known p -Value and Critical Value Approaches 1. Develop the hypotheses. H0: µ < 12 Ha: µ > 12 2. Specify the level of significance. α = .05 3. Compute the value of the test statistic. z= x − µ 13.25 − 12 = = 2.47 σ / n 3.2/ 40 One-Tailed Tests About a Population Mean: σ Known p –Value Approach 4. Compute the p –value. For z = 2.47, cumulative probability = .9932. p–value = 1 - .9932 = .0068 5. Determine whether to reject H0. Because p–value = .0068 < α = .05, we reject H0. There is sufficient statistical evidence to infer that the Health Service is not meeting the response goal of 12 minutes. 5 ...
View Full Document

This note was uploaded on 04/15/2010 for the course ECON ECON 2500 taught by Professor Siminseruy during the Spring '10 term at York University.

Ask a homework question - tutors are online