This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Methods I (EXST 7005) Page 62 Probability distribution tables in general
The tables we will use will ALL be giving the area in the tail (α). However, if you examine a
number of tables from other sources you will find that this is not always true. Even when it is
true, some tables will give the value of α as if it were in two tails, and some as if it were in
one tail.
For example, we want to conduct a twotailed Z test at the α = 0.05 level. We happen to know
that Z = 1.96. If we look at this value in the Z tables we expect to see a value of 0.025, or
α/2. But many tables would show the probability for 1.96 as 0.975, and some as 0.05.
Why the difference? It just depends on how the tables are presented. Some of the alternatives
are shown below.
Some tables give cumulative distribution starting at – infinity. You want to find the
probability corresponding to 1 – α/2. The value that leaves .025 in the upper tail
would be 0.975.
Some tables may start at zero (0.0) and give the cumulative area from this point for the
upper half of the distribution. This would be less common. The value that leaves .025
in the upper tail would be 0.475.
Among the tables like ours, that give the area in the tail, some are called two tailed tables
and some are one tailed tables.
Table value,
0.0.025
4 3 2 1 0 1 2 3 4 α=0.025 1α=0.975
One tailed table.
Table value,
0.050
4 3 2 1 0 α/2 1α 1 2 3 4 α/2 Two tailed table. Why the extra confusion at this point?
All our tables will give the area in the tail.
The Z tables we used gave the area in one tail. For a two tailed test you needed to doubled the
probability.
For the F tables and Chi square tables covered later, this area will be a single tail as with the Z
tables. This is because these distributions are not symmetric.
Traditionally, many ttables have given the area in TWO TAILS instead of on one tail.
Many textbooks have this type of tables.
SAS will also usually give twotailed values for ttests. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 63 Our tables will have both twotailed probabilities (top row) and onetailed probabilities
(bottom row), so you my use either.
The same patterns are true for many of the computer programs that you may use to get
probabilities. For example in EXCEL
If you use the NORMDIST(1.96) function it returns 0.975, one tail, cumulative from –∞
If you enter NORMSINV(0.025) it returns –1.96, the two tailed value
If you enter TINV(0.05,9999) it returns 1.96, so it is also twotailed.
The TDIST(1.96,9999,1) function allows you to specify 1 or 2 tails in the function call. The t tables
My ttables are created in EXCEL, but patterned after Steel & Torrie, 1980, pg. 577.
The degrees of freedom, “d.f.” or γ, are given on the left side of the table.
The probability of randomly selecting a larger value of t is given at the top (and bottom) of the
page.
P(t ≥ t0) given at the bottom, this is a onetailed probability.
P(t ≥ t0) given at the top, this is a twotailed probability (not the absolute value signs)
Each row represents a different t distribution (with different d.f.).
The Z table had many probabilities, corresponding to Z values of 0.00, 0.01, 0.02, 0.03, etc.
About 400 probabilities occurred in the tables we used. They all fit on one page because
the whole Z table was a single distribution. The t table has many different distributions so
less information is given about each distribution.
If we are going to give many different tdistributions on one page, we lose something. We
will only give a few selected probabilities, the ones we are most likely to use.
e.g., 0.10, 0.05, 0.025, 0.01, 0.005.
Only the POSITIVE side of the table is given, but as with the Z distribution, the t distribution is
symmetric, so the lower half of the table can be determined by using the upper half. Our ttables
Partial ttable – 1 or 2 tails?
df
1
2
3
4
5
6
7
8
9
10
∞ 0.100
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.282 0.050
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.645 0.025
12.71
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
1.960 0.010 0.005
31.82 63.656
6.965 9.925
4.541 5.841
3.747 4.604
3.365 4.032
3.143 3.707
2.998 3.499
2.896 3.355
2.821 3.250
2.764 3.169
2.326 2.576 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 64 Note the selected d.f. on the left side.
The table stabilizes fairly quickly. Many tables don't go over about d.f. = 30. The Z tables give
a good approximation for larger d.f.
Our tables will give d.f. as follows down the left most column of the table,
1, 2, 3, 4, 5 ,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ,20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 32, 34, 36, 38, 40, 45, 50, 75, 100, ∞
Selected probabilities
In the topmost row of the table selected probabilities will be given as α for a TWO
TAILED TEST.
In the bottommost row of the table selected probabilities will be given as α for a ONE
TAILED TEST.
Probabilities in our tables are,
Top row:
Bottom row: 0.50
0.25 0.40
0.20 0.30
0.15 0.20
0.10 0.10
0.05 0.050
0.025 0.02
0.01 0.010
0.005 0.002
0.001 0.0010
0.0005 HELPFUL HINT: Don't try to memorize “two tail top, one tail bottom”, just recall the
characteristics of the distribution when df = ∞ then t = 1.96. This leaves 5% in both tails
and 2.5% in one tail. So take any ttable and look to see what probability corresponds to
df=∞ and t = 1.96. If the value is 0.025, it is the area in one tail of the distribution and if it
is 0.050 it is a two tailed table. If the area is 0.975 it is cumulative from – ∞, etc.
This trick of recalling 1.96 also works for Z tables. The tables we use give the area in the tail
of the distribution, Z = 1.96 corresponds to a probability of 0.025. Some Z tables give the
cumulative area under the curve starting at –∞, the probability at Z = 1.96 would be 0.975.
Other Z tables give the cumulative area starting at 0, the probability at Z = 1.96 would be
0.475 Working with our ttables
Example 1. Let d.f. = γ = 10
H0: μ = μ0 versus H1: μ ≠ μ 0 and α = 0.05
P(t ≥ t0) = 0.05; 2P(t ≥ t0)=0.05;
P(t ≥ t0)=0.025
(Probabilities at the top of the table) 4 3 2 1 0 1 2 3 4 1 2 3 4 t0=2.228
Example 2. Let d.f. = γ = 10
H0: μ = μ0 versus H1: μ > μ0 and α = 0.05
P(t ≥ t0) = 0.05
(probabilities at the bottom of the table)
t0=1.812 4 3 2 1 0 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 65 Look up the following values.
Find the t value for H1: μ ≠ μ0, α=0.050, d.f. = ∞
Find the t value for H1: μ > μ0, α = 0.025, d.f. = ∞
Find the t value for H1: μ ≠ μ0, α = 0.010, d.f. = 12
Find the t value for H1: μ > μ0, α = 0.025, d.f. = 22
Find the t value for H1: μ ≠ μ0, α = 0.200, d.f. = 35
Find the t value for H1: μ ≠ μ0, α = 0.002, d.f. = 5
Find the t value for H1: μ < μ0, α = 0.100, d.f. = 8
Find the t value for H1: μ < μ0, α = 0.010, d.f. = 75
Find the P value for t = –1.740, H1: μ < μ0, d.f. = 17
Find the P value for t = 4.587, H1: μ ≠ μ0, d.f. = 10 1.960
1.960
3.055
2.074
1.306
5.894
–1.397
–2.377
0.050
0.001 ttest of Hypothesis
We want to determine if a new drug has an effect on blood pressure of rhesus monkeys before and
after treatment. We are looking for a net change in pressure, either up or down (twotailed
test). Example 1 of the ttest
We obtain a random sample of 10 individuals. Note: n = 10, but d.f. = γ = 9
1) H0: μ = μ0
2) H1: μ ≠ μ0
3) Assume: Independence (randomly selected sample) and that the CHANGE in blood
pressure is normally distributed.
4) We set α = 0.01, but split between two tails (to meet the alternate hypothesis).
P(t ≥ t0) = 0.01; 2P(t ≥ t0) = 0.01; P(t ≥ t0) = 0.005 in each tail
The critical value of t is:
Given that it is a 2 tailed test, with 9 d.f. (n = 10, but d.f. = γ = 9) and we set α = 0.01
Under these conditions, the critical limit from the ttable is t0 = 3.250
5) Obtain values from the sample of 10 individuals (n = 10). The values for change in blood
pressure were; 0, 4, –3, 2, 0, 1, –4, 5, –1, 4
n ∑Y
i =1
n ∑Y
i =1 = 0 + 4 – 3 + 2 + 0 +1– 4 + 5 –1+ 4 = 8 i 2 i = 0 + 16 + 9 + 4 + 0 + 1 + 16 + 25 + 1 + 16 = 88 n Y = ∑Y
i =1 i n = 8
= 0.8
10 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 66
2 ⎛ n ⎞
⎜ ∑ Yi ⎟
n
n
2
2
∑ (Yi − Y ) ∑ Yi − ⎝ i=1n ⎠ 88 − 6410 (88 − 6.4 )
2
= i =1
=
=
= 9.067
SY = i =1
( n − 1)
9
9
n SY = 9.067 = 3.011 SY = SY 3.011
=
= 0.952
10
n Finally, the value of the test statistic, a t value in this case, is t= (Y − μ ) = ( 0.8 − 0 ) = 0.840
0 SY 0.952 with 9 d.f. 6) Compare the critical limit to the test statistic and decide to reject or fail to reject.
The critical limit from the ttable is t0 = 3.250
The test statistic calculated from the sample was 0.840 (9 d.f.)
The area leaving 0.005 in each tail is almost
too small to show on our usual graphs.
The test statistic is clearly in the region of
“acceptance”, so we fail to reject the H0.
1
2
3
7) Conclude that the new drug does not affect the 4 3 2 1 0
blood pressure of rhesus monkeys. Is there
an error? Maybe a Type II error, but not a Type I error since we did not reject the null
hypothesis. Example 2 of the ttest
A company manufacturing environmental monitoring equipment claims that their thermograph
(a machine that records temperature) requires (on the average) no more than 0.8 amps to
operate under normal conditions. We wish to test this claim before buying their
equipment. We want to reject the equipment if the electricity demand exceeds 0.8 amps.
1) H0: μ = μ0 , where μ0 = 0.8
2) H1: μ > μ0
3) Assume (1) independence and (2) a normal distribution of amp values, or at least of the
mean that we will test. We do not assume a known variance with the ttest, we use a
variance calculated from the sample.
4) We set α = 0.05. The critical value of t for considers that
we are doing a 1 tailed test (see H1:) with 15 d.f. (n = 16, but d.f. = γ = 15) and α = 0.05
P(t ≥ t0) = 0.05 from the table is t0 = 1.753 James P. Geaghan Copyright 2010 4 Statistical Methods I (EXST 7005) Page 67 5) Draw a sample. We have 16 machines for testing. The individual values for amp
readings were not recorded. Summary statistics are given below; Y = 0.96 SY = 0.32
SY =
t= SY 0.32
=
= 0.08
16
n (Y − μ ) = ( 0.96 − 0.8) = 2.00
0 SY 0.08 with 15 d.f. 6) Compare the critical limit and to the test
statistic.
The critical limit from the table is t0 = 1.753
and the calculated test statistic was t = 2
(with 15 d.f.) 4 3 2 1 0 1 2 3 4 t0 = 1.753 Clearly, the test statistic exceeds the one tailed critical limit and falls in the upper tail of
the distribution in area of rejection.
7) Conclusion: We would conclude that the machines require more electricity than the
claimed 0.8 amperes. Of course, there is a possibility of a Type I error. t test with SAS
SAS example (#2a)
Recall our test of blood pressure change of Rhesus monkeys. We can take the values of blood
pressure change, and enter them in SAS PROC UNIVARIATE.
Values: 0, 4, –3, 2, 0, 1, –4, 5, –1, 4 SAS PROGRAM DATA step
OPTIONS NOCENTER NODATE NONUMBER LS=78 PS=61;
TITLE1 'ttests with SAS PROC UNIVARIATE';
DATA monkeys; INFILE CARDS MISSOVER;
TITLE2 'Analysis of Blood Pressure change in Rhesus Monkeys';
INPUT BPChange;
CARDS; RUN;
The data would follow the cards statement ending with a semicolon
PROC PRINT DATA=monkeys; RUN;
PROC UNIVARIATE DATA=monkeys PLOT; VAR BPChange;
TITLE2 'PROC Univariate on Blood Pressure Change'; RUN; The PROC UNIVARIATE from SAS® will perform a twosample ttest.
See SAS PROGRAM output. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 68 Notes on SAS PROC Univariate
Note that all values we calculated match the values given by SAS.
Note that the standard error is called the “Std Error Mean”. This is unusual; it is called the “Std
Error” in most other SAS procedures.
The test statistic value matches our calculated value (0.840).
SAS also provides a “Pr>t 0.4226”.
Calculated
The value provided by SAS is a P value
Upper
Lower
(Pr>t = 0.4226) meaning that the
value
Critical
calculated value of t = 0.840 would leave Critical
region
0.4226 (or 42.46 percent) of the
region
distribution in the 2 tails (half in each
4
3
2
1
0
1
2
3
4
tail). The two tailed split is indicated by
the absolute value signs around t, so the proportion in each tail is 0.2113 (or 21.13 %). The Pvalue indicates our calculated value would leave 21.13% in each tail, our critical region
has only 0.5% in each tail. Clearly we are in the region of “acceptance”. Example 2b with SAS
Testing the thermographs using SAS PROC UNIVARIATE. We didn't have data, so we cannot
test with SAS.
A NOTE. SAS automatically tests the mean of the values in PROC UNIVARIATE against 0.
In the thermograph example our hypothesized value was 0.8, not 0.0.
But from what we know of transformations, we can subtract 0.8 from each value without
changing the characteristics of the distribution. SAS Example 2c – Freund & Wilson (1993) Example 4.2
We receive a shipment of apples that are supposed to be “premium apples”, with a diameter of at
least 2.5 inches. We will take a sample of 12 apples, and test the hypothesis that the mean
size is equal 2.5 inches, and thus qualify as premium apples. If LESS THAN 2.5 inches, we
reject.
1) H0: μ = μ0
2) H1: μ < μ0
3) Assume: Independence (randomly selected sample)
Apple size is normally distributed. 4) α = 0.05. We have a one tailed test (H1: μ < μ0), and we chose α = 0.05. The critical limit
would be a t value with 11 d.f. This value is –1.796.
5) Draw a sample. We will take 12 apples, and let SAS do the calculations.
The sample values for the 12 apples are;
2.9, 2.1, 2.4, 2.8, 3.1, 2.8, 2.7, 3.0, 2.4, 3.2, 2.3, 3.4
As mentioned, SAS automatically tests against zero, and we want to test against 2.5. So,
we subtract 2.5 from each value and test against zero. The test should give the same
results.
James P. Geaghan Copyright 2010 ...
View
Full
Document
This note was uploaded on 12/29/2011 for the course EXST 7005 taught by Professor Geaghan,j during the Fall '08 term at LSU.
 Fall '08
 Geaghan,J

Click to edit the document details