This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Standard deviation s – standard error s/√n  Inference for Distributions For a sample of size n,
the sample standard deviation s is: s= n − 1 is the “degrees of freedom.” 1
∑ ( xi − x )2
n −1 The value s/√n is called the standard error of the mean SEM.
Scientists often present sample results as mean ± SEM.
Simin Seury, Department of Economics, York University, Canada
Date: Mar 30 2010 A study examined the effect of a new medication on the seated systolic blood
pressure. The results, presented as mean ± SEM for 25 patients, are 113.5 ±
What is the standard deviation s of the sample data? SEM = s/√n <=> s = SEM*√n
s = 8.9*√25 = 44.5 t Distribution The t distributions The t distribution is a family of similar probability
A specific t distribution depends on a parameter
known as the degrees of freedom.
Degrees of freedom refer to the number of
independent pieces of information that go into the
computation of s. what is s? Suppose that an SRS of size n is drawn from an N(µ, σ) population.
When σ is known, the sampling distribution is N(µ, σ/√n).
When σ is estimated from the sample standard deviation s, the
sampling distribution follows a t distribution t(µ , s/√n) with degrees
of freedom n − 1. t= When σ is unknown
The sample standard deviation s provides an estimate of the population
standard deviation σ.
When the sample size is large,
the sample is likely to contain
elements representative of the
whole population. Then s is a
good estimate of σ. But when the sample size is
small, the sample contains only
a few individuals. Then s is a
mediocre estimate of σ. x −µ
s n is the one-sample t statistic. When n is very large, s is a very good estimate of σ, and the
corresponding t distributions are very close to the normal distribution.
The t distributions become wider for smaller sample sizes, reflecting the
lack of precision in estimating σ from s. Population
distribution Large sample Small sample 1 Standardizing the data before using Table D The one-sample t-confidence interval As with the normal distribution, the first step is to standardize the data. The level C confidence interval is an interval with probability C of
containing the true population parameter. Then we can use Table D to obtain the area under the curve. t(µ ,s/√n)
df = n − 1 t= s/√n µ t(0,1
df = n − 1 x −µ
s n 1 We have a data set from a population with both µ and σ unknown. We
use x to estimate µ and s to estimate σ, using a t distribution (df n−1).
Practical use of t : t*
C is the area between −t* and t*. 0 x t We find t* in the line of Table D
for df = n−1 and confidence level
m The margin of error m is: m Here, µ is the mean (center) of the sampling distribution, m = t*s and the standard error of the mean s/√n is its standard deviation (width).
You obtain s, the standard deviation of the sample, with your calculator. Table D n −t* t* Green Tea and Heart Attack
It is sometimes claimed that drinking green tea may protect against heart
attacks. Green tea contains certain component, Gx which may act on blood
cholesterol, likely helping to prevent heart attacks. When σ is unknown,
we use a t distribution
with “n−1” degrees of
freedom (df). Table D shows the
z-values and t-values
confidence levels. t= To see if drinking green tea increases the average blood level of Gx, a group of
nine randomly selected healthy men were assigned to drink green tea daily in
the morning and in the afternoon for two weeks. The levels of Gx in their blood
were assessed before and after the study, and the percent change is presented
7.4 8.1 8.4 x −µ
s n Firstly: Are the data approximately normal? Percent change Histogram Frequency 4
0 When σ is known, we
use the normal
distribution and the
standardized z-value. 2.5 5 7.5 9 More Percentage change in levels of Gx
in blood Table A vs. Table D 9
0 There is a low
value, but overall
the data can be
Normal quantiles 2 What is the 95% confidence interval for the average percent change? Table A gives the area to the
LEFT of hundreds of z-values. Sample average = 5.5; s = 2.517; df = n − 1 = 8
(…) It should only be used for
(…) The sampling distribution is a t distribution with n − 1 degrees of freedom. Table D Table D gives the area
to the RIGHT of a
dozen t or z-values.
(…) It can be used for
t distributions of a
given df and for the
Normal distribution. Table D also gives the middle area under a t or normal distribution comprised
between the negative and positive value of t or z. For df = 8 and C = 95%, t* = 2.306.
The margin of error m is: m = t*s/√n = 2.306*2.517/√9 ≈ 1.93.
With 95% confidence, the population average percent increase of Gx in
blood levels of healthy men drinking two cups of green tea daily is
between 3.6% and 7.6%. Important: The confidence interval shows how large
the increase is, but not if it can have an impact on men’s health. 2 Table D Excel
Menu: Tools/DataAnalysis: select “Descriptive statistics” For df = 9 we only
look into the
corresponding row. PercentChange
Confidence Level(95.0%) 5.5
1.934695 s/√n The calculated value of t is 2.7.
We find the 2 closest t values. m !!! Warning: do not use the function =CONFIDENCE(alpha, stdev, size)
This assumes a normal sampling distribution (stdev here refers to σ)
and uses z* instead of t* !!! The one-sample t-test 2.398 < t = 2.7 < 2.821
0.02 > upper tail p > 0.01
For a one-sided Ha, this is the P-value (between 0.01 and 0.02);
for a two-sided Ha, the P-value is doubled (between 0.02 and 0.04).  Introduction to Inference As in the previous chapter, a test of hypotheses requires a few steps: 1. Stating the null and alternative hypotheses (H0 versus Ha)
2. Deciding on a one-sided or two-sided test Tests of Significance
Use and Abuse of Tests 3. Choosing a significance level α
4. Calculating t and its degrees of freedom Lecture Notes 5. Finding the area under the curve with Table D Simin Seury, Department of Economics, York University, Canada
Date: 30 March 2010 6. Stating the P-value and interpreting the result The P-value is the probability, if H0 is true, of randomly drawing a
sample like the one obtained or more extreme, in the direction of Ha. The significance level α and P-value
The significance level, α, is the largest P-value tolerated for rejecting a The P-value is calculated as the corresponding area under the curve, true null hypothesis (how much evidence against H0 we require). This one-tailed or two-tailed depending on Ha: value is decided arbitrarily before conducting the test.
If the P-value is equal to or less than α (P ≤ α), then we reject H0. One-sided
(one-tailed) If the P-value is greater than α (P > α), then we fail to reject H0. x − µ0
s n Does the packaging machine need revision?
Two-sided test. The P-value is 4.56%. Two-sided
(two-tailed) * If α had been set to 5%, then the P-value would be significant.
* If α had been set to 1%, then the P-value would not be significant. 3 Critical Value Approach to
One-Tailed Hypothesis Testing Steps of Hypothesis Testing The test statistic z has a standard normal probability
We can use the standard normal probability
distribution table to find the z-value with an area
of a in the lower (or upper) tail of the distribution. Step 1. Develop the null and alternative hypotheses.
Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the test
statistic. The value of the test statistic that established the
boundary of the rejection region is called the
critical value for the test. p-Value Approach The rejection rule is:
Lower tail: Reject H0 if z < -zα
• Upper tail: Reject H0 if z > zα Step 5. Reject H0 if p-value < α • Lower-Tailed Test About a Population Mean:
Critical Value Approach
of z = x − µ
σ/ n Reject H0 α = .10 Step 4. Use the value of the test statistic to compute the
p-value. Steps of Hypothesis Testing
Critical Value Approach
Step 4. Use the level of significance to determine
the critical value and the rejection rule.
Step 5. Use the value of the test statistic and the
rejection rule to determine whether to reject H0. Do Not Reject H0 z
−zα = −1.28 0 Upper-Tailed Test About a Population Mean:
σ Known One-Tailed Tests About a Population Mean:
Example: Critical Value Approach The response times for a random sample of 40
medical emergencies were tabulated. The sample
mean is 13.25 minutes. The population standard
deviation is believed to be 3.2 minutes. Sampling
of z = x − µ
σ/ n Reject H0 Do Not Reject H0 α = .05 z
0 The director of Health Service wants to perform a
hypothesis test, with a .05 level of significance, to
determine whether the service goal of 12 minutes or
less is being achieved. zα = 1.645 4 One-Tailed Tests About a Population Mean:
p -Value and Critical Value Approaches
1. Develop the hypotheses. H0: µ < 12
Ha: µ > 12 2. Specify the level of significance. α = .05 3. Compute the value of the test statistic. z= x − µ 13.25 − 12
σ / n 3.2/ 40 One-Tailed Tests About a Population Mean:
p –Value Approach
4. Compute the p –value.
For z = 2.47, cumulative probability = .9932.
p–value = 1 - .9932 = .0068 5. Determine whether to reject H0.
Because p–value = .0068 < α = .05, we reject H0.
There is sufficient statistical evidence to
infer that the Health Service is not meeting
the response goal of 12 minutes. 5 ...
View Full Document
This note was uploaded on 04/15/2010 for the course ECON ECON 2500 taught by Professor Siminseruy during the Spring '10 term at York University.
- Spring '10