This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Methods I (EXST 7005) Page 53 Recent literature had tended to giving just the actual “P value”, and letting the reader decide if
it is “significant”. The P-value is just the area in the tail above the calculated Z value. For
example, with our Oak tree example, the calculated Z value was 2.236. This was larger
than our critical value of 1.96. so the “tail” would be smaller than 0.025.
So, how unusual is a value of 2.236?
Actually, the probability of a randomly
chosen value exceeding this value is
0.0127 in one tail. For a two tailed tests
we would express this probability as
2(0.0127) = 0.0254 since we would
reject for either – 2.236 or +2.236. Area above
value -3 -2 -1 0 1 2 3 4 The P-value is then: P = 0.0254. For most tests that we do, SAS will give this value.
If smaller than the desired α, calculated test statistic value would be in the tail and would be
rejected. If larger than the desired α, test statistic value would not be in the tail and would
be not be rejected. Most tests in SAS are two–tailed, though a few are one-tailed. Another Example
The mean for high school seniors on a nationally standardized reading test is 170 points with a
variance of 400. The principal of a small rural high school hypothesizes that the 9 seniors in
his school will score better than the national average. Test his hypothesis (data given later).
I. H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0
II. H1: μ > μ 0 or H1: μ − μ 0 > 0
III. Assume that the scores are (1) normally and (2) independently distributed with a (3) known
variance of σ2 = 400. (i.e. the distribution is NID(170, 400)).
IV. Let the probability of Type I error equal 5%. (i.e. α = 0.05)
V. Find the critical limits given that we want a one tailed test against the upper tail with α = 0.05.
The Z value which will leave 5% in the upper
tail is 1.645.
VI. Gather new data to test the hypothesis. The
test results for the 9 students were: 164, 175,
186, 173, 191, 187, 189, 176 and 179. The
summary statistics for this group are Y = 180
and S2 = 634. However, we know the true
national variance (σ = 400) for the test and can use this in a Z test. 0 1 2 3 4 The condition of “known variance” is really important to using a Z test, and should be added
as a third assumption.
The test calculations are Z = Y − μ0 σ2 n = 180 − 170
9 VII. This value does not reach the critical value of 1.645, so we cannot conclude that these 9
seniors scored significantly higher than the national average. Apparently, it is not that
unusual, at the 5% level, for any subgroup of 9 individuals to score 10 points above the James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 54 national mean. However, the P value for the observed Z score is P = 0.0668, so it is not very
Are we convinced that these 9 students are not above average? This would be our conclusion if
the P value had reached 0.05, but it reached only 0.0668. Close! The principal may well
claim that this was significant. As scientists we may decide it is just too close to call, and
“reserve judgment” pending more data. Summary
Logic: We need a known probability distribution and we need to determine what is likely for our
known distribution under the null hypothesis.
Any conditions needed for this to work out are specified in the assumptions.
Both one and two-tailed alternative hypotheses are possible. Review the 7 Steps of Hypothesis testing
I. Determine the H0
II. Determine the H1
III. Consider the assumptions
IV. Determine a value for α and obtain a critical region for a test statistic (e.g. Z), from your
knowledge of alpha (α) and the H1.
V. Obtain a sample of new data to test the Hypothesis. Compute the appropriate statistic from
a sample (e.g. Y ) and calculate the value of the TEST STATISTIC (Z)
VI. Compare the calculated value of the test statistic to the CRITICAL VALUES. Make your
decision to either reject the H0 or to FAIL to Reject the H0.
VII. Draw you conclusions from the test of Hypothesis and interpret your results. The 5 steps of Hypothesis Testing according to Freund & Wilson.
1) Establish H0 , H1 and a value for α.
2) Determine the test statistic and a region for rejection
3) Draw a sample, calculate the test statistic
4) Compare the test statistic to the critical limits and make a decision to reject or fail to reject.
5) Interpret the results Hypothesis testing Concepts
The logic of test of hypothesis is based on the chosen probability of error, α (or significance level)
for the test statistic (Z) which determines the range of what would be expected due to chance
alone assuming H0 is true.
Significance level notation, commonly used levels and terminology
“Statistically significant” α = 0.05
“Highly significant” α = 0.01
James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 55 Errors!
When we do a test of hypothesis is it possible that we are wrong?
Yes, unfortunately, it is always possible that we are wrong. Furthermore, there are two types of
error that we could make!
Types of error
H0 is true Data indicates:
H0 is false True result:
H0 is true NO ERROR Type I Error:
Reject TRUE H0 True result:
H0 is false Type II Error:
Fail to Reject FALSE H0 NO ERROR Type I Error: Type I error is the rejection of a true null hypothesis.
This type of error is also called α (alpha) error. This is the value that we choose as the
“level of significance”, so we actually set the probability of making this type of error.
The probability of a type I error = α
Type II Error: Type II error is the failure to reject of a null hypothesis that is false. This type
of error is also called β (beta) error.
We do not set this value, but we call the probability of a type II error = β.
Furthermore, in practice we will never know this value. This is another reason we cannot
“accept” the null hypothesis, because it is possible that we are wrong and we cannot
state the probability of this type of error.
The good news, it is only possible to make one error at a time.
If you reject H0, then you may have made a type I error, but you cannot have made a type II
If you fail to reject H0, then you may have made a type II error, but you cannot have made a
type I error. The probability of Type II Error
This is a probability that we will not know. This probability is called β. However, we can do
several things to make the error smaller, so this will be our objective.
First, let's look at how these errors occur.
Examine the hypothesized distribution (below) that we believe to have a mean of 18. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 56 10
4 10 12 We are going to do a 2 tailed test with an α value of 0.05.
Our critical limits will be ±1.96. So we will reject any test statistic over 1.96 (or under –1.96).
But let’s suppose the null hypothesis is false!!! Let’s suppose that the alternate
hypothesis is true. Then the hypothesized distribution is not real, there is another “real”
distribution that we are sampling from. What might it look like? Here is the real distribution. It actually has a mean of
22, but we don't know that. If we did, we would not have hypothesized a mean of 18!
14 16 18 20 22 24 26 27 28 Critical value 10 12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 4 27 28 So where on the real distribution is our critical limit. This is the key question. 10
4 27 28 Note that with the Z transformations each change of
1 unit of Z corresponds to a change of 2 on the original Y scale. This means that on the
original scale σ = 2. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 57 Critical value 10 12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 4 27 28 Now we draw our sample from the real
distribution. If our result says reject the H0, we make no error.
Critical value 10
12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 4 27 28 But if our result causes us to fail to reject, we err. Critical value 10 12 14 16 18 20 22 24 26 27 28 And in this case it appears we have pretty close to a 50-50 chance of going either way. 10 So we take our sample and do our test. Will we err?
Maybe we will, and maybe we won’t. Our sample could come from anywhere in this
“real” distribution. If our sample happens to fall in the lower red area (below about 22),
we would not reject H0, and we would err. But if our sample happens to fall in the upper
yellow area (above about 22), we will reject H0. In this case there is no error, we draw
the correct conclusion. 12 14 16 18 20 22 24 26 27 28 The Probability of Type II error or β error
For α = 0.05, our critical limit, in terms of Z, would be 1.96.
The critical limit translates to a value on the original scale of
Yi = μ + Z iσ = 18 ± 1.96(2) = 18 ± 3.92 . The lower bound is 14.08 and the upper bound is
21.92. The lower bound is so far down on the real distribution that the probability of getting a
sample that falls there is near zero. The upper bound is the one that falls in the middle of the
In this fictitious case we know that the true mean is 22. Normally we wouldn't know the true
mean. Since we know the true mean in this case, we can calculate the probability of drawing a
sample above and below the critical limit (21.92 on the Y scale, –0.04 on the Z scale of the
real distribution). The probability of falling below this value, and of making a type II error, is
0.484, or about 48.4%. This is the probability we call beta (β). James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 58 The probability of falling above this value, and of NOT making a type II error, is 0.516, or 51.6%.
So in this case we can calculate β, the probability of a Type II error. In practice we cannot
usually know these probabilities because we never know the real value of the mean.
We define a new term POWER, this is the probability of NOT making a type II error (1 – β).
This was 0.516 in our example. Power and Type II Error
Since we don't actually know the value of the true mean (or we wouldn't be hypothesizing
something else), we cannot know in practice the type II error rate (β). However, it is affected
by a number of things, and we can know about these. 1) Power is affected by the distance between the hypothesized mean (μ0) and true
mean (μ). The Power Curve 1
0 Difference between true and hypothesized mean 2) Power is affected by the value chosen for Type I error (α). 10 12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 4 10 12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 27 27 28 4 28 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 59 3) Power is affected by the variability or spread of the distribution. 10 12 14 16 18 20 22 24 26 27 28 10 12 14 16 18 20 22 24 26 27 28 Influencing the power of a test of hypothesis
The capability of the test to reject H0 when it is false is called Power = 1 – β. Anything done to
enhance this value will improve your ability to test for differences among populations. Which
of the 3 factors influencing power can you control?
For testing means you may be able to control sample size (n). This reduces the variability and
You probably cannot influence the difference between μ and μ0.
You can choose any value of α. However, this cannot be too small or Type II error becomes
more likely. Too large and Type I error becomes likely. Methods of increasing the power of a test
How would we use our knowledge of factors affecting power to increase the power of our tests
Increase the significance level (e.g. from α = 0.01 to α = 0.05)
If H0 is true we would increase α, the probability of a Type I error.
If H0 is false then we decrease β, the probability of a Type II error, and by decreasing β,
we are increasing the POWER of test.
For a given α, the POWER can be increased by ....
decreases, and the amount of overlap between the real
Increasing n, so σ Y = σ n = σ
and hypothesized distributions decreases. For example, let’s suppose we are conducting a test of the hypothesis H0: μ = μ0 against
the alternative H1: μ ≠ μ 0. We believe μ0 = 50 and we set α = 0.05. We also know
that σ2 = 100 and that n = 25.
From this information we can calculate σY = σ n = 10 = 2 . The critical region in
5 terms of Z is then P(|Z| ≥ Z0) = 0.05 and Z0 = 1.96, and the critical value on the
original scale Y variable scale is Yi = μ + Ziσ = 50 + 1.96(2) = 53.92.
If the REAL population mean is 54, calculate P(Y ≥ 53.92), given that the TRUE mean
is 54 we calculate the Z value as Z = (53.92 – 54)/2 = –0.08 / 2 = –0.04. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 60 The probability of a TYPE II error (β) is the probability of not drawing a sample that
falls above this value and not rejecting the false null hypothesis. The value is β =
P(Z ≤ –0.04) = 0.4840.
So for an experiment with n = 25, the power is 1 – β = 1 – 0.4840 = 0.516.
But suppose we had a larger sample, say n = 100. Now σY = σ = 10 = 1. The critical
region stays at Z0 = 1.96, but on the original scale this is now Yi = μ + Ziσ = 50 +
n 1.96(1) = 51.96. For a true mean of 54 we now get Z = (51.96–54)/1 = –2.04/1 = –
2.04. The value of β = is P(Z ≤ –2.04) = 0.0207, and the power for this test is 1 – β = 0.9793.
The bottom line,
With n = 25, the power is 0.5160.
With n = 100, the power is 0.9793.
This is why statisticians recommend larger sample sizes so strongly. We may never really
know what power is, but we know how to increase it and reduce the probability of
TYPE II error. Summary
Hypothesis testing is prone to two types of errors, one we control (α) and one we do not (β).
Type I error is the REJECTION of a true null hypothesis.
Type II error is the FAILURE TO REJECT a null hypothesis that is false.
The “Power” of a test is 1 – β
Not only do we not control TYPE II error, we probably do not even know its value. However, we
can hopefully reduce this error, and increase power, by
Controlling the distance between μ and μ0 (not really likely)
Selecting a value of α that is not too small (0.05 and 0.01 are the usual values)
Getting a larger sample size (n), this is the factor that is usually under the most control of the
investigator. The t-test of hypotheses
The t distribution is used the same as Z distribution, except it is used where sigma (σ) ,is unknown
(or where Y is used instead of μ to calculate deviations). The t distribution is a bell shaped
curve, like the Z distribution, but not the same. The Z distribution is normal because it has a
normal distribution in the numerator (Yi) and all other terms in the transformation are
constant. The t distribution has a normal distribution in the numerator but the sample variance
in the denominator is another statistic with a chi square distribution.
ti = (Y i −Y
S ) ; the t distribution applied to individual observations James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005)
t= Page 61 (Y − μ ) = (Y − μ ) ; the t distribution used for hypothesis testing
0 SY 0 S n where;
S = the sample standard deviation, (calculated using Y instead of μ)
S Y = the sample standard error The variance of the t distribution is greater than that of the Z distribution (except where n → ∞),
since S2 estimates σ2, but is never as good (reliability is less)
variance Z distribution
1 t distribution
≥1 Characteristics of the t distribution
E(t) = 0, the expected value of the t distribution is zero.
It is symmetrically distributed about a mean of 0 with t values ranging between ±∞
(i.e. –∞ ≤ t ≤ +∞)
There is a different t distribution for each degree of freedom (df), since the distribution changes as
the degrees of freedom change.
It has a broader spread for smaller df, and narrows (approaching the Z distribution) as df increase.
As the df (γ, gamma) approaches infinity (∞), the t distribution converges the Z distribution.
Z (no df associated); middle 95% is between ± 1.96
t with 1 df; middle 95% is between ± 12.706
t with 10 df; middle 95% is between ± 2.228
t with 30 df; middle 95% is between ± 2.042
t with ∞ df; middle 95% is between ± 1.96
How does the test for the t distribution differ from the Z distribution?
• For the Z distribution, since Yi is normally distributed, subtracting a constant (μο) and
dividing by a constant (σ) does not affect the distribution and Z is normal. • For the t distribution we also have a normally distributed Yi and we subtract a constant
(μο), but we divide by a statistic (S), not a constant (σ). • This alters the distribtuion so that it is not quite a normal distribution. The extra
incertainty causes the t distribution to be “broader” than the Z distriution. • However, as sample size increases the value of S approaches σ and the t distribution
converges on the Z distribution. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 62 Probability distribution tables in general
The tables we will use will ALL be giving the area in the tail (α). However, if you examine a
number of tables from other sources you will find that this is not always true. Even when it is
true, some tables will give the value of α as if it were in two tails, and some as if it were in
For example, we want to conduct a two-tailed Z test at the α = 0.05 level. We happen to know
that Z = 1.96. If we look at this value in the Z tables we expect to see a value of 0.025, or
α/2. But many tables would show the probability for 1.96 as 0.975, and some as 0.05.
Why the difference? It just depends on how the tables are presented. Some of the alternatives
are shown below.
Some tables give cumulative distribution starting at – infinity. You want to find the
probability corresponding to 1 – α/2. The value that leaves .025 in the upper tail
would be 0.975.
Some tables may start at zero (0.0) and give the cumulative area from this point for the
upper half of the distribution. This would be less common. The value that leaves .025
in the upper tail would be 0.475.
Among the tables like ours, that give the area in the tail, some are called two tailed tables
and some are one tailed tables.
-4 -3 -2 -1 0 1 2 3 4 α=0.025 1-α=0.975
One tailed table.
-4 -3 -2 -1 0 α/2 1-α 1 2 3 4 α/2 Two tailed table. Why the extra confusion at this point?
All our tables will give the area in the tail.
The Z tables we used gave the area in one tail. For a two tailed test you needed to doubled the
For the F tables and Chi square tables covered later, this area will be a single tail as with the Z
tables. This is because these distributions are not symmetric.
Traditionally, many t-tables have given the area in TWO TAILS instead of on one tail.
Many textbooks have this type of tables.
SAS will also usually give two-tailed values for t-tests. James P. Geaghan Copyright 2010 ...
View Full Document
This note was uploaded on 12/29/2011 for the course EXST 7005 taught by Professor Geaghan,j during the Fall '08 term at LSU.
- Fall '08