This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Methods I (EXST 7005) Page 46 Tests of hypothesis
Hypothesis – a contention based on preliminary evidence of what appears to be fact (an educated
guess), which may or may not be true.
• Formulating a hypothesis is the second step in the scientific method. • A statement of the hypothesis is the first step in experimentation. Test of hypothesis – a comparison of the contention with a set of newly gathered data. Hypothesis testing procedure – we will consider 7 steps
I. Set up a meaningful hypothesis such as “The population mean is equal to some value” (call it
H0: μ = μ0 or μ – μ0 = 0 This is called the null hypothesis. It is a hypothesis of equality or of no difference (even if
you believe there is a difference). Note that hypotheses are always stated in terms of
the population parameters, not the sample statistics we actually measure, because we
are drawing inference about the population.
II. Set up an alternative hypothesis
Alternative hypotheses are denoted H1 or Ha. This hypothesis states what is correct if the
null hypothesis in not correct. This is usually the case of actual interest.
a) H 1 : μ ≠ μ 0 or H 1 : μ − μ 0 ≠ 0 (also called the non-directional alternative)
b) H1: μ < μ 0 or H1: μ − μ 0 < 0
c) H1: μ > μ 0 or H1: μ − μ 0 > 0
III. Consider the assumptions.
1) We will be using the Z distribution, so the distribution we are testing must be normal.
2) The observations should be independent. The best guarantee of independent
observations is random sampling.
3) Strictly speaking, the variance should be known in order to use the Z distribution;
however it is often used for very large samples. Later we will discuss the tdistribution that is used when the variance is not known and must be estimated from
There will be a few other assumptions for other test statistics. However, the tests of
hypothesis we will be using are also “robust”. Statistically speaking, robustness
indicates that the test performs quite well even if the assumptions are not perfectly
IV. Select a probability of rejecting the null hypothesis (H0) when it is true. This is called the
alpha (α) value and the value chosen is somewhat arbitrary. By convention the values
usually chosen is α = 0.05 or sometimes α = 0.01.
for α = 0.05 then if H0 is true we will reject it 5% of the time, or in one of 20 samples
for α = 0.01 then if H0 is true we will reject it 1% of the time, or in one of 100 samples
James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 47 This value is sometimes called the significance level.
From this value, and the alternate hypothesis, we can determine the critical limits, those
values of the test statistic that would cause us to reject the null hypothesis.
Determine a critical region, what is too large or too small by using the chosen
probability or significance level.
Critical region – the area in the distribution which would lead to rejection of the null
hypothesis (H0:). When we reject we know that it is possible that the null
hypothesis is true, but if it is we would only reject α∗100% of the time. So this
type of error can be controlled.
Region of “acceptance” – the area under the distribution which would lead to
“acceptance” of the null hypothesis (H0:). 1−α α/2 −∞ α/2 0 Lower Critical
Region Region of "Acceptance" Z values +∞ Upper Critical
Region Notice that I have placed the word “acceptance” in quotes. We cannot really state that we
“accept” the null hypothesis because it is also possible that we would be wrong in
doing so. Unfortunately, in practice the probability of this type of error is unknown
and therefore one cannot “accept” with a known probability of error (more later under
Type II error and Power).
V. Draw a sample from the population of interest (as defined by the investigator), and
a) Compute an estimate of the parameter in the hypothesis; in our example the hypothesis
was about μ so the statistic will be Y , recall E( Y ) = μ.
b) The value of Y from the sample now becomes one of many possible observations from
the derived population of all sample means.
c) Recall that the derived population has, μY = μ
σY = σ n
σY = σ n = σ
n d) Recall that the distribution of sample means ( Yk ) approaches a normal distribution as
the value of n increases (according to the Central Limit Theorem). This helps meet
our assumption of normality.
James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005)
e) Recall the Z transformation Zi = Page 48 (Y − μ )
i Y σY . Our null hypothesis contends that the true value of μ Y is our hypothesized value, μ 0 ,
so we will calculate a Z score using μ 0 . This will follow a Z distribution if the
null hypothesis is correct. If the null hypothesis is not correct we don’t care what
the distribution is, we just hope to reject the null hypothesis. Zi = (Y − μ )
0 i σY As a result, where μ 0 is the hypothesized value of the mean, if the null hypothesis is
true and μ = μ0 we would expect Z to be approximately zero (within reasonable
limits, defined later). On the other hand, if μ ≠ μ0 we would expect Z to be
different from zero by some amount. If Z is too much greater than zero (i.e. Z > 0),
that suggests that μ Y is too large while if Z is much less than zero, then μ Y
appears to be too small.
VI. Compare the test statistic from step V to the critical region determined in step IV.
VII. Draw conclusions and interpretations from the results of the test. The test statistic is not
an end in itself. Logic behind the test
A key aspect of a test of hypothesis is that we must have a test statistic with a known
We could sample from any one of numerous populations with many different distributions.
The characteristics of these distributions are unknown, but if we can transform the
sampled distribution to a known distribution, we can then make some probability
Beyond this, we simply want to determine what is likely under the null hypothesis. If
we hypothesize a mean of μ0 and take a sample of mean that is actually close to μ0,
then the null hypothesis is probably true. If, on the other hand, the calculated sample
mean is not close to μ0, and if the difference big enough that it is not likely to have
occurred due to sampling variation, then the alternate hypothesis is the more likely
Reasonable limits – recall that we needed to define this
a set of limits of the critical region determined by the significance level (α) and by the
alternative hypothesis (e.g. was it two tailed, or one tailed, and if one tailed, to which
side). The value of α is what specifies what we feel would be unlikely under the null
hypothesis. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 49 SUMMARY OF THE 7 STEPS OF HYPOTHESIS TESTING
I. Establish a null hypothesis, H0:
II. Determine an appropriate alternative hypothesis to the null, H1:
III. Consider the assumptions
IV. Determine a value for α and find the critical limits and a critical region for the chosen
V. Obtain a sample of new data to test the hypothesis, and compute the appropriate test statistic
from the sample.
VI. Using the critical region and the test statistic (e.g. Z), compare the values and make a
decision to reject the H0 or to fail to reject the H0.
VII. Draw your conclusions from the test of hypothesis. Example of a Test of Hypothesis
Extensive measurements done in eastern Tennessee have shown that the average 20-year-old
White Oak produces an average of 12 Kg of acorns with a variance of 4 Kg2. Five White
Oaks in Georgia produced a mean of 14 Kg. Assuming that the variance is the same, test
the hypothesis that the production is the same in Tennessee and Georgia.
1) H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0 (where μ0 = 12 Kg, the known value for Tennessee.
2) H 1 : μ ≠ μ 0 or H 1 : μ − μ 0 ≠ 0 . We might be tempted to test the hypothesis,
H1: μ > μ 0 , since the Georgia oaks had a mean of 14 Kg. However, remember that
this is supposed to be a new data set to test the hypothesis and we would not have
known this in advance.
3) Assume the sample of Oaks is random (independent) and normally distributed. We also
have a known variance from Tennessee of 4 Kg2.
4) Determine a value of α and obtain the critical limits for a critical region for the test
statistic using our knowledge of H1 and α.
We will somewhat arbitrarily choose a
value of α = 0.05. This is a commonly
used and accepted value.
The H1 indicates that we are doing a 2
tailed test. To keep α at 0.05, place
half the value of a in each tail (0.0250
per tail). This corresponds to critical Z
values of ±1.96 1−α α/2
+∞ 0 Lower Critical
Region Region of "Acceptance" -1.96 Upper Critical
Region +1.96 The Critical Region: red areas in the tails are areas of rejection.
5) Obtain a new data set to test the hypothesis, and compute the appropriate test statistic
from the sample ( Y for testing differences in the means).
The results for our sample were Y = 14, and n = 5. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005)
6) Calculate Z = Page 50 (Y − μ ) = (14 − 12 ) = ( 2 )
0 i σY 45 2 5 = 5 ( 2)
= 5 = 2.236.
2 This value of the test statistic is greater than the limit for the upper critical region
(±1.96), so it falls in the region of rejection. This would be interpreted to indicate
that it is unlikely that a value this large would arise by random chance alone if the
null hypothesis were true.
7) Conclusion: Reject the null hypothesis and conclude that the two areas differ in terms of
We can also go one step further. Since the production levels are different we can also
conclude that production is greater in Georgia since it had a greater value for the
mean production. One-tailed tests
Suppose our problem had been a little different, and that we had believed from the beginning
that Georgia had a higher rate of production. Something we believed BEFORE we started
the study (a priori). We might then want to test for only this alternative, i.e.
H1: μ > μ 0 or H1: μ − μ 0 > 0 , where μ0 = 12 Kg (the Tennessee value).
Now, the test is altered because we will have a different critical value. We still want an α =
0.05, but we would put all 5% chance of error
in the upper tail!
Note that this makes it “easier” to show
significance, because we only need meet the
1.645 criteria instead of the 1.96 criteria.
However, it implies that we have some additional
knowledge and have no interest in the lower
tail. What if the calculated value was well into
in the lower tail? Presumably this would be a
spurious occurrence and not of interest, because
we “know” it can't happen. 1−α
+∞ 0 Region of "Acceptance" Upper Critical
Region +1.645 In fact, if our critical value was 1.645 in the upper tail, and we found the Georgia value to be
less than Tennessee, no additional calculations would be needed because the calculated Z
value would be negative and could not be in the upper tail. In other words, if our
hypothesized value (μ0) is greater than out observed value ( Y ), then the calculation
(Yi − μ0 ) would be negative and could not be in the upper tail that was hypothesized.
Z= σY James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 51 Additional notes and terminology on hypothesis testing
Recall that a key aspect of a test of hypothesis is that we must have a test statistic with a known
distribution. For our present discussion we are using the Z distribution.
Given H 0 : μ = μ 0 and H1: μ ≠ μ 0 and α = 0.05
1) If H0 is true then (1–α)100% of the samples will yield a Z test statistic that will fall in the
region of “acceptance”. That is, for α = 0.05, then (1–α)100% = 95%. This is sometimes
referred to as the confidence level.
2) For a two-tailed test, half of the possible samples will have a Z test statistic score in the
upper critical region [(α/2)100%], and half of the samples will have a score in the lower
critical region [(α/2)100%]. For α = 0.05, then (α/2)100% = 2.5%.
3) Since H1: μ ≠ μ 0 (implying we do not know “a priori” if the hypothesized value might be
too large or too small), the probability statement then becomes.
P(|Z| ≥ Z0) = α = 0.05 (the absolute value sign indicates Z may be positive or negative)
2P(Z ≥ Z0) = α = 0.05
P(Z ≥ Z0) = α /2 = 0.025
so Z0 = 1.96 from the Z tables 4) If the calculated Z test statistic is between –1.96 and +1.96, we cannot reject the null
hypothesis ( H 0 : μ = μ 0 ). This means that the observed statistic is consistent with the
hypothesized value, BUT we can never actually PROVE that H0 is true. It is relatively easy
to prove that things are different, but almost impossible to prove that two things are
So we resort to jargon; we say that ...
• there is no “statistically significant difference” • there is no “significant difference” • that “the data is consistent with the null hypothesis” • that we “fail to reject the null hypothesis”. These statements are better (more correct) than stating that we ACCEPT the null hypothesis
or that the null hypothesis is TRUE.
5) For a two tailed test, if the calculated |Z| is greater than, or equal to, the critical value of the
test statistic (e.g. Z = 1.96), then reject the H0, and conclude that the null hypothesis is
correct. For one tailed tests, if Z > 1.96 reject for the hypothesis H1: μ > μ 0 and if or Z < –
1.96 conclude that H1: μ < μ 0 .
6) The size of the critical region is determined by α, the level of significance.
Note that when we reject the null hypothesis, there is a chance that we are in error, but that
we know the probability of making that error. It is α. This is because we can set the
level of fallibility in our conclusions for this type of error.
7) When we have a one tailed alternative, say H1: μ > μ 0 , versus the null hypothesis H 0 : μ = μ 0
, what happens to the cases that may be much less than the hypothesized values? Since we James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 52 have a one tailed test we must know that such cases are impossible, or are simply not of
interest no matter how small they must be. In this case some investigators prefer notation
where the other extreme is included in the null hypothesis.
H 0 : μ ≤ μ 0 versus H1: μ > μ 0
H 0 : μ ≥ μ 0 versus H1: μ < μ 0 This is acceptable, but the statistical development of a test of hypothesis actually considers
only the equality in the null hypothesis and doesn’t really consider these cases. Final notes on the one and two tailed alternatives
1) The two tailed test is called the “non directional
H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0
H 1 : μ ≠ μ 0 or H 1 : μ − μ 0 ≠ 0 α/2 1−α α/2 0
This means that we will accept either
H 1: μ < μ 0 or H 1: μ > μ 0 as fulfillment of the alternate hypothesis. Since either case is to
be accepted we state our probability with an absolute value, P(|Z| ≥ Z0) = α = 0.05 and for
a 5% chance of error, we divide the 5% into equal parts (usually) and put half in each tail. 2) The one tailed test is called the “directional alternative”.
H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0 and
H1: μ < μ 0 or H1: μ − μ 0 < 0 or
H1: μ > μ 0 or H1: μ − μ 0 > 0 1−α α this indicates that we will accept only one
of the two options, H1: μ < μ 0 or
H1: μ > μ 0 as fulfillment of the alternate hypothesis. Since only one case is to be accepted,
we state our probability as either P(Z ≥ Z0) = α = 0.05 or P(Z ≤ Z0) = α = 0.05, and for a
5% chance of error, we put all 5% into the tail of interest.
Why α = 0.05, and not 0.04 or 0.09?
No particular reason. The value is not special, but has become something of a convention
or traditional standard. This value represents a one chance in 20 of error. It is
generally accepted as a reasonable chance of error, and is usually acceptable to
referees, editors and journals. However, if you want to use another value, and have
some good reason for doing so, this should be possible.
The value of 0.05 has traditionally been termed the level at which we have “statistically
significant” results. A value of 0.01 is then considered a “highly significant” result. P values in tests of hypothesis
Probability value or P values, like those we have discussed previously, just represent some area
under a curve. However, in the context of hypothesis testing they indicate that area under
the curve that represents a value equal or larger than some observed value of a test statistic. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 53 Recent literature had tended to giving just the actual “P value”, and letting the reader decide if
it is “significant”. The P-value is just the area in the tail above the calculated Z value. For
example, with our Oak tree example, the calculated Z value was 2.236. This was larger
than our critical value of 1.96. so the “tail” would be smaller than 0.025.
So, how unusual is a value of 2.236?
Actually, the probability of a randomly
chosen value exceeding this value is
0.0127 in one tail. For a two tailed tests
we would express this probability as
2(0.0127) = 0.0254 since we would
reject for either – 2.236 or +2.236. Area above
value -3 -2 -1 0 1 2 3 4 The P-value is then: P = 0.0254. For most tests that we do, SAS will give this value.
If smaller than the desired α, calculated test statistic value would be in the tail and would be
rejected. If larger than the desired α, test statistic value would not be in the tail and would
be not be rejected. Most tests in SAS are two–tailed, though a few are one-tailed. Another Example
The mean for high school seniors on a nationally standardized reading test is 170 points with a
variance of 400. The principal of a small rural high school hypothesizes that the 9 seniors in
his school will score better than the national average. Test his hypothesis (data given later).
I. H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0
II. H1: μ > μ 0 or H1: μ − μ 0 > 0
III. Assume that the scores are (1) normally and (2) independently distributed with a (3) known
variance of σ2 = 400. (i.e. the distribution is NID(170, 400)).
IV. Let the probability of Type I error equal 5%. (i.e. α = 0.05)
V. Find the critical limits given that we want a one tailed test against the upper tail with α = 0.05.
The Z value which will leave 5% in the upper
tail is 1.645.
VI. Gather new data to test the hypothesis. The
test results for the 9 students were: 164, 175,
186, 173, 191, 187, 189, 176 and 179. The
summary statistics for this group are Y = 180
and S2 = 634. However, we know the true
national variance (σ = 400) for the test and can use this in a Z test. 0 1 2 3 4 The condition of “known variance” is really important to using a Z test, and should be added
as a third assumption.
The test calculations are Z = Y − μ0 σ2 n = 180 − 170
9 VII. This value does not reach the critical value of 1.645, so we cannot conclude that these 9
seniors scored significantly higher than the national average. Apparently, it is not that
unusual, at the 5% level, for any subgroup of 9 individuals to score 10 points above the James P. Geaghan Copyright 2010 ...
View Full Document
- Fall '08