This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Standard deviation s – standard error s/√n [7] Inference for Distributions For a sample of size n,
the sample standard deviation s is: s= n − 1 is the “degrees of freedom.” 1
∑ ( xi − x )2
n −1 The value s/√n is called the standard error of the mean SEM.
Scientists often present sample results as mean ± SEM.
Lecture Notes
Simin Seury, Department of Economics, York University, Canada
Date: Mar 30 2010 A study examined the effect of a new medication on the seated systolic blood
pressure. The results, presented as mean ± SEM for 25 patients, are 113.5 ±
8.9.
What is the standard deviation s of the sample data? SEM = s/√n <=> s = SEM*√n
s = 8.9*√25 = 44.5 t Distribution The t distributions The t distribution is a family of similar probability
distributions.
A specific t distribution depends on a parameter
known as the degrees of freedom.
Degrees of freedom refer to the number of
independent pieces of information that go into the
computation of s. what is s? Suppose that an SRS of size n is drawn from an N(µ, σ) population.
When σ is known, the sampling distribution is N(µ, σ/√n).
When σ is estimated from the sample standard deviation s, the
sampling distribution follows a t distribution t(µ , s/√n) with degrees
of freedom n − 1. t= When σ is unknown
The sample standard deviation s provides an estimate of the population
standard deviation σ.
When the sample size is large,
the sample is likely to contain
elements representative of the
whole population. Then s is a
good estimate of σ. But when the sample size is
small, the sample contains only
a few individuals. Then s is a
mediocre estimate of σ. x −µ
s n is the onesample t statistic. When n is very large, s is a very good estimate of σ, and the
corresponding t distributions are very close to the normal distribution.
The t distributions become wider for smaller sample sizes, reflecting the
lack of precision in estimating σ from s. Population
distribution Large sample Small sample 1 Standardizing the data before using Table D The onesample tconfidence interval As with the normal distribution, the first step is to standardize the data. The level C confidence interval is an interval with probability C of
containing the true population parameter. Then we can use Table D to obtain the area under the curve. t(µ ,s/√n)
df = n − 1 t= s/√n µ t(0,1
0,1)
0,1
df = n − 1 x −µ
s n 1 We have a data set from a population with both µ and σ unknown. We
use x to estimate µ and s to estimate σ, using a t distribution (df n−1).
Practical use of t : t*
C is the area between −t* and t*. 0 x t We find t* in the line of Table D
for df = n−1 and confidence level
C. C
m The margin of error m is: m Here, µ is the mean (center) of the sampling distribution, m = t*s and the standard error of the mean s/√n is its standard deviation (width).
You obtain s, the standard deviation of the sample, with your calculator. Table D n −t* t* Green Tea and Heart Attack
It is sometimes claimed that drinking green tea may protect against heart
attacks. Green tea contains certain component, Gx which may act on blood
cholesterol, likely helping to prevent heart attacks. When σ is unknown,
we use a t distribution
with “n−1” degrees of
freedom (df). Table D shows the
zvalues and tvalues
corresponding to
landmark Pvalues/
confidence levels. t= To see if drinking green tea increases the average blood level of Gx, a group of
nine randomly selected healthy men were assigned to drink green tea daily in
the morning and in the afternoon for two weeks. The levels of Gx in their blood
were assessed before and after the study, and the percent change is presented
here:
0.7 3.5
4
4.9 5.5
7
7.4 8.1 8.4 x −µ
s n Firstly: Are the data approximately normal? Percent change Histogram Frequency 4
3
2
1
0 When σ is known, we
use the normal
distribution and the
standardized zvalue. 2.5 5 7.5 9 More Percentage change in levels of Gx
in blood Table A vs. Table D 9
8
7
6
5
4
3
2
1
0 There is a low
value, but overall
the data can be
considered
reasonably normal.
2 1
0
1
Normal quantiles 2 What is the 95% confidence interval for the average percent change? Table A gives the area to the
LEFT of hundreds of zvalues. Sample average = 5.5; s = 2.517; df = n − 1 = 8
(…) It should only be used for
Normal distributions.
(…) The sampling distribution is a t distribution with n − 1 degrees of freedom. Table D Table D gives the area
to the RIGHT of a
dozen t or zvalues.
(…) It can be used for
t distributions of a
given df and for the
Normal distribution. Table D also gives the middle area under a t or normal distribution comprised
between the negative and positive value of t or z. For df = 8 and C = 95%, t* = 2.306.
The margin of error m is: m = t*s/√n = 2.306*2.517/√9 ≈ 1.93.
With 95% confidence, the population average percent increase of Gx in
blood levels of healthy men drinking two cups of green tea daily is
between 3.6% and 7.6%. Important: The confidence interval shows how large
the increase is, but not if it can have an impact on men’s health. 2 Table D Excel
Menu: Tools/DataAnalysis: select “Descriptive statistics” For df = 9 we only
look into the
corresponding row. PercentChange
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Confidence Level(95.0%) 5.5
0.838981
5.5
#N/A
2.516943
6.335
0.010884
0.7054
7.7
0.7
8.4
49.5
9
1.934695 s/√n The calculated value of t is 2.7.
We find the 2 closest t values. m !!! Warning: do not use the function =CONFIDENCE(alpha, stdev, size)
This assumes a normal sampling distribution (stdev here refers to σ)
and uses z* instead of t* !!! The onesample ttest 2.398 < t = 2.7 < 2.821
thus
0.02 > upper tail p > 0.01
For a onesided Ha, this is the Pvalue (between 0.01 and 0.02);
for a twosided Ha, the Pvalue is doubled (between 0.02 and 0.04). [6] Introduction to Inference As in the previous chapter, a test of hypotheses requires a few steps: 1. Stating the null and alternative hypotheses (H0 versus Ha)
2. Deciding on a onesided or twosided test Tests of Significance
Use and Abuse of Tests 3. Choosing a significance level α
4. Calculating t and its degrees of freedom Lecture Notes 5. Finding the area under the curve with Table D Simin Seury, Department of Economics, York University, Canada
Date: 30 March 2010 6. Stating the Pvalue and interpreting the result The Pvalue is the probability, if H0 is true, of randomly drawing a
sample like the one obtained or more extreme, in the direction of Ha. The significance level α and Pvalue
The significance level, α, is the largest Pvalue tolerated for rejecting a The Pvalue is calculated as the corresponding area under the curve, true null hypothesis (how much evidence against H0 we require). This onetailed or twotailed depending on Ha: value is decided arbitrarily before conducting the test.
If the Pvalue is equal to or less than α (P ≤ α), then we reject H0. Onesided
(onetailed) If the Pvalue is greater than α (P > α), then we fail to reject H0. x − µ0
t=
s n Does the packaging machine need revision?
Twosided test. The Pvalue is 4.56%. Twosided
(twotailed) * If α had been set to 5%, then the Pvalue would be significant.
* If α had been set to 1%, then the Pvalue would not be significant. 3 Critical Value Approach to
OneTailed Hypothesis Testing Steps of Hypothesis Testing The test statistic z has a standard normal probability
distribution.
We can use the standard normal probability
distribution table to find the zvalue with an area
of a in the lower (or upper) tail of the distribution. Step 1. Develop the null and alternative hypotheses.
Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the test
statistic. The value of the test statistic that established the
boundary of the rejection region is called the
critical value for the test. pValue Approach The rejection rule is:
Lower tail: Reject H0 if z < zα
• Upper tail: Reject H0 if z > zα Step 5. Reject H0 if pvalue < α • LowerTailed Test About a Population Mean:
σ Known
Critical Value Approach
Sampling
distribution
of z = x − µ
σ/ n Reject H0 α = .10 Step 4. Use the value of the test statistic to compute the
pvalue. Steps of Hypothesis Testing
Critical Value Approach
Step 4. Use the level of significance to determine
the critical value and the rejection rule.
Step 5. Use the value of the test statistic and the
rejection rule to determine whether to reject H0. Do Not Reject H0 z
−zα = −1.28 0 UpperTailed Test About a Population Mean:
σ Known OneTailed Tests About a Population Mean:
σ Known
Example: Critical Value Approach The response times for a random sample of 40
medical emergencies were tabulated. The sample
mean is 13.25 minutes. The population standard
deviation is believed to be 3.2 minutes. Sampling
distribution
of z = x − µ
σ/ n Reject H0 Do Not Reject H0 α = .05 z
0 The director of Health Service wants to perform a
hypothesis test, with a .05 level of significance, to
determine whether the service goal of 12 minutes or
less is being achieved. zα = 1.645 4 OneTailed Tests About a Population Mean:
s Known
p Value and Critical Value Approaches
1. Develop the hypotheses. H0: µ < 12
Ha: µ > 12 2. Specify the level of significance. α = .05 3. Compute the value of the test statistic. z= x − µ 13.25 − 12
=
= 2.47
σ / n 3.2/ 40 OneTailed Tests About a Population Mean:
σ Known
p –Value Approach
4. Compute the p –value.
For z = 2.47, cumulative probability = .9932.
p–value = 1  .9932 = .0068 5. Determine whether to reject H0.
Because p–value = .0068 < α = .05, we reject H0.
There is sufficient statistical evidence to
infer that the Health Service is not meeting
the response goal of 12 minutes. 5 ...
View
Full
Document
This note was uploaded on 04/15/2010 for the course ECON ECON 2500 taught by Professor Siminseruy during the Spring '10 term at York University.
 Spring '10
 SiminSeruy

Click to edit the document details