Unformatted text preview: Statistical Methods I (EXST 7005) Page 99 Twosample ttests
Recall the derived population of sample means.
Imagine you had two of these derived populations, as we had for variances when we developed the
Ftest.
Equality of Population Means
Population 1
μ1 Mean
Variance σ12 Population 2
μ2
2
σ2 Given two populations, test for equality. Actually test to see if the difference between two means
is equal to some value, usually zero. H0: μ1 − μ2 = δ where often δ = 0 Derived populations
From population 1 draw all possible samples of size n1 to get Y .
1
• Mean μ1 yields μ1 = μY1 • Variance • Derived population size is N1 1 2
σ12 yields σ12 = σ Y2 = σ1 n
1
1 n Likewise population 2 gives Y .
2
• Mean μ2 yields μ2 = μY2 • Variance • Derived population size is N 2 2 2
2
σ 2 yields σ 22 = σ Y2 = σ 2 n
2
2 n Draw a sample from each population
Population 1 Population 2 Sample size n1 n2 d.f. γ1 γ2 Sample mean Y
1 Y2 Sample variance S12 2
S2 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) ( Page 100 ) Take all possible differences Y1 − Y2 = d . That is, all possible means from the first population (
n
N1n1 means) and all possible means from the second population ( N 2 2 ), and get the difference
n n between each pair. There are a total of ( N1 1 )( N 2 2 ) values. (Y −Y ) = d
1i 2j k n n n n for i = 1,..., N1 1 , j = 1,..., N 2 2 and k = 1,...,( N1 1 )( N 2 2 ) ( ) The population of differences Y i −Y2 j = dk has a mean equal to
1
equal to μd or μY1−Y2 , a variance 2
2
σ d or σ Y1 −Y and a standard error equal to σ d or σ Y1 −Y (the standard deviation of the mean difference). Characteristics of the derived population
1) As n1 and n2 increase, the distribution of d approaches a normal distribution. If the two
original populations are normal, the distribution of d is normal regardless of n1 and n2.
2) The mean of the differences ( dk ) is equal to the difference between the means of the two parent
populations. μd = μY1 − μY2 = μY1 −Y2 = δ
3) The variance of the differences ( dk ) is equal to the variance for the difference between the
means of the two parent populations.
2
σ d = σ Y2 −Y =
1 2 σ d = σ Y −Y =
1 2 σ 12
n1 + σ 12
n1 2
σ2 n2
+ 2
σ2 n2 This is a variance for a linear combination. The two samples are assumed independent, so
covariance considerations are not needed.
The variance comes from linear combinations. The variance of the sums is the sum of the
variance (no covariance if independent).
Linear combination: Y1 − Y2 Y
The coefficients are 1, –1 (i.e. 1 1 + (−1)Y2 )
ˆ2
ˆ2
The variance for this linear combination is 12 σ Y1 + (−1)2 σ Y2 = ˆ
ˆ2
σ 12 σ 2 +
if the two variables are
n1 n2
independent. Since they are each separately sampled at random this is an easy assumption. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 101 The twosample ttest
H 0 : μ1 − μ 2 = δ
H 1: μ1 − μ 2 ≠ δ ,
a nondirectional alternative (one tailed test) would specify a difference, either >δ or < δ.
Commonly, δ is 0 (zero)
If H0 is true, then E(d )=μ1 − μ2
σ =
2
d 2
σ12 σ 2 n1 If values of + n2 2
σ12 and σ 2 were KNOWN, we could use a Ztest, Z= d δ σd = (Y1 − Y2 ) − ( μ1 − μ 2 ) . σ 12
n1 If values of + σ 22
n2 2
σ12 and σ 2 were NOT KNOWN, and had to be estimated from the samples, we would use a ttest, t = d  δ = (Y1 − Y2 ) − ( μ1 − μ 2 ) .
2
2
Sd S1 S 2
+
n1 n2 Since the hypothesized difference is usually 0 (zero), the term ( μ1 − μ2 ) is usually zero, and the
equation is often simplified to t = d
,
Sd Et voila, a two sample ttest!
This is a very common test, and it is the basis for many calculations used in regression and
analysis of variance (ANOVA). It will crop up repeatedly as we progress in the course. It is
very important!
t= d  δ (Y1 − Y2 ) − ( μ1 − μ 2 ) , often written just t = d when δ or
=
Sd
Sd
S12 S 22
+
n1 n2 μ1 − μ2 is equal to zero. The twosample ttest
Unfortunately, this is not the end of the story. It turns out that there is some ambiguity about the
S12 S 22
+
degrees of freedom for the error variance,
. Is it n1–1, or n2–1, or somewhere in
n1 n2
between, or maybe the sum? Power considerations
POWER! We want the greatest possible power. It turns out that we get the greatest power
(and our problems with degrees of freedom go away) if we can combine the two variance
James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 102 estimates into one, single, new and improved estimate! But we can only do this if the two
variances are not different.
We can combine the two variance estimates into a single estimate if they are not too different.
To determine if they are sufficiently similar we use an F test. Therefore, twosample ttests START WITH AN F TEST! Pooling variances
If the two estimates of variance are sufficiently similar, as judged by the F test of variances (e.g. 2
H0: σ12 = σ2 ), then they can be combined. This is called “pooling variances”, and is done as a weighted mean (or weighted average) of the variances. The weights are the degrees of
freedom. Weighted means or averages
n n i =1 i =1 The usual mean is calculated as Y = ∑ Yi n . The weighted mean is Y = ∑ wiYi n ∑ w , or
i =1 i the sum of the variable multiplied by the weights divided by the sum of the weights.
k 2
2
Pooled variances are calculated as Pooled S = SP = ∑γ S
j =1 2
i i where j will be j = 1 and 2 k ∑γ
j =1 i for groups 1 and 2. There could be more than 2 variances averaged in other situations.
2
Recall that γ j S j = SS j , so we can also calculate the sum of the corrected SS for each variable divided by the sum of the d.f. for each variable S 2
p γS
=∑
j 2
j ∑γ =∑ SS j j ∑γ j γ 1S12 + γ 2 S22 SS1 + SS2
SS1 + SS2
=
=
Pooled variance calculation S =
(n1 −1) + (n2 −1)
γ1 + γ 2
γ1 + γ 2
2
p Two sample ttest variance estimates
From linear combinations we know that the variance of the sum is the sum of the variances.
S2 S2
2
2
This is the GENERAL CASE. 1 + 2 . But, if we test H0: σ1 = σ2 and fail to reject, we
n1 n2
2
can pool the variances. The error variance is then S p 1 1
( n + n ) . One additional minor
1 2 simplification is possible. If n1 = n2 = n, then we can place the pooled variance over a
2
2S p
single n,
.
n
So we now have a single, more powerful, pooled variance! What are it's degrees of freedom?
The first variance had a d.f.= n1 – 1 = γ1
The second variance had d.f.= n2 – 1 = γ2 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 103 The pooled variance has a d.f equal to the sum of the d.f. for the variances that were
2
pooled, so the degrees of freedom is S p is (n1–1)+( n2–1) = γ1 + γ2 Summary: case where σ12 = σ 22 Test the variances to determine if they are significantly different. This is an F test of
2
H0: σ12 = σ2 . 2
If they are not different, then pool the variances into a single estimate of S p . The ttest is then done using this variance used to estimate the standard error of the
difference.
The d.f. are (n1–1) + (n2–1)
The ttest equation is then t = d − δ (Y1 − Y2 ) − ( μ1 − μ 2 )
=
Sd
1 1
2
Sp
+
n1 n2 ( ) One other detail; we are conducting the test with the condition that σ1
new assumption for this test, equal variances. 2 = σ 22 . This will be a Assumptions: NID r.v. (μ, σ2)
N for Normality; the differences are normally distributed
I for Independence; the observations and samples are independent
Since the variance is specified to be a single variance equal to σ2, then the variances are
equal or the variance is said to be homogeneous. The compliment to homogeneous
variance is heterogeneous variance.
Equal variance is also called homoscedasticity and the alternative referred to as
heteroscedasticity. Samples characterized as having equal variance can also be
referred to as homoscedastic or heteroscedastic. Case where σ12 ≠ σ 22 How do we conduct the test if the variances are not equal? On the one had, this is not a
problem. The linear combination we used to get the variance does not require
S2 S2
homogeneous variance, so we know the variance is 1 + 2 .
n1 n2
But what are the degrees of freedom?
It turns out the d.f. are somewhere between the smaller of n1–1 and n2–1 and the d.f. for the
pooled variance estimate [(n1–1) + (n2–1)].
It would be conservative to just use the smaller of n1–1 and n2–1. This works and is
reasonable and it is done. However, power is lost with this solution. (This solution is
suggested by your textbook).
The alternative is to estimate the d.f. using an approximation developed by Satterthwaite.
This solution is used by SAS in the procedure PROC TTEST.
James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 104
2 Satterthwaite’s approximation d.f. = γ ≈ ⎡
⎢
⎢
⎢
⎣ 2
⎡ S12 S2 ⎤
+ ⎥
⎢n n
2 ⎦
⎣ 1
2
2
S1
S2
2
2⎤
n1
n2 ⎥
+
⎥
n1 − 1
n2 − 1 ⎥
⎦ ( ) ( ) This calculation is, of course, an approximation as the name suggests. Note that it does
not usually give nice integer degrees of freedom, expect some decimal places. This
is not an issue for computer programs that can get Pvalues for any d.f. It does
complicate using our tables a little.
There is one additional “simplification”. We know that the d.f. are at least the smaller of n1–1
and n2–1. But what if n1 = n2 = n? In this case the d.f. will be at least n–1. However,
Satterthwaite's approximation will still, usually, yield a larger d.f. Summary
There are two cases in twosample ttests. The case where
. σ12 = σ 22 and the case where σ12 ≠ σ 22 There are also some considerations for the cases where n1 = n2 and where n1 ≠ n2.
Each of these cases alters the calculation of the standard error of the difference being tested and
the degrees of freedom.
Variance
n1 ≠ n2 σ12 = σ 22 σ12 ≠ σ 22 1 1
(n + n ) S12 S 22
+
n1 n2 2
Sp 1 n1 = n2 =n 2S 2 2
p n 2
S12 + S 2
n d.f. σ12 = σ 22 σ12 ≠ σ 22 n1 ≠ n2 (n1 – 1) + (n2 – 1) ≥ min[(n1 – 1), (n2 – 1)] n1 = n2 =n 2n – 2 ≥n–1 For our purposes, we will generally use SAS to conduct twosample ttests, and will let SAS
determine Satterthwaite's approximation when the variances are not equal?
How does SAS know if the variances are equal? How does it know what value of α you want to
use? Good questions. Actually, SAS does not know or assume anything. We'll find out what
it does later.
One last thought on testing for differences between two populations. The test we have been
primarily discussing is the t test, a test of equality of means. However, if we find in the
process of checking variance that the variances differ, then there are already some differences
between the two populations that may be of interest. James P. Geaghan Copyright 2010 ...
View
Full
Document
 Fall '08
 Geaghan,J

Click to edit the document details