EXST7005 Fall2010 15a Two sample t-test

EXST7005 Fall2010 15a Two sample t-test - Statistical...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Methods I (EXST 7005) Page 99 Two-sample t-tests Recall the derived population of sample means. Imagine you had two of these derived populations, as we had for variances when we developed the F-test. Equality of Population Means Population 1 μ1 Mean Variance σ12 Population 2 μ2 2 σ2 Given two populations, test for equality. Actually test to see if the difference between two means is equal to some value, usually zero. H0: μ1 − μ2 = δ where often δ = 0 Derived populations From population 1 draw all possible samples of size n1 to get Y . 1 • Mean μ1 yields μ1 = μY1 • Variance • Derived population size is N1 1 2 σ12 yields σ12 = σ Y2 = σ1 n 1 1 n Likewise population 2 gives Y . 2 • Mean μ2 yields μ2 = μY2 • Variance • Derived population size is N 2 2 2 2 σ 2 yields σ 22 = σ Y2 = σ 2 n 2 2 n Draw a sample from each population Population 1 Population 2 Sample size n1 n2 d.f. γ1 γ2 Sample mean Y 1 Y2 Sample variance S12 2 S2 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) ( Page 100 ) Take all possible differences Y1 − Y2 = d . That is, all possible means from the first population ( n N1n1 means) and all possible means from the second population ( N 2 2 ), and get the difference n n between each pair. There are a total of ( N1 1 )( N 2 2 ) values. (Y −Y ) = d 1i 2j k n n n n for i = 1,..., N1 1 , j = 1,..., N 2 2 and k = 1,...,( N1 1 )( N 2 2 ) ( ) The population of differences Y i −Y2 j = dk has a mean equal to 1 equal to μd or μY1−Y2 , a variance 2 2 σ d or σ Y1 −Y and a standard error equal to σ d or σ Y1 −Y (the standard deviation of the mean difference). Characteristics of the derived population 1) As n1 and n2 increase, the distribution of d approaches a normal distribution. If the two original populations are normal, the distribution of d is normal regardless of n1 and n2. 2) The mean of the differences ( dk ) is equal to the difference between the means of the two parent populations. μd = μY1 − μY2 = μY1 −Y2 = δ 3) The variance of the differences ( dk ) is equal to the variance for the difference between the means of the two parent populations. 2 σ d = σ Y2 −Y = 1 2 σ d = σ Y −Y = 1 2 σ 12 n1 + σ 12 n1 2 σ2 n2 + 2 σ2 n2 This is a variance for a linear combination. The two samples are assumed independent, so covariance considerations are not needed. The variance comes from linear combinations. The variance of the sums is the sum of the variance (no covariance if independent). Linear combination: Y1 − Y2 Y The coefficients are 1, –1 (i.e. 1 1 + (−1)Y2 ) ˆ2 ˆ2 The variance for this linear combination is 12 σ Y1 + (−1)2 σ Y2 = ˆ ˆ2 σ 12 σ 2 + if the two variables are n1 n2 independent. Since they are each separately sampled at random this is an easy assumption. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 101 The two-sample t-test H 0 : μ1 − μ 2 = δ H 1: μ1 − μ 2 ≠ δ , a non-directional alternative (one tailed test) would specify a difference, either >δ or < δ. Commonly, δ is 0 (zero) If H0 is true, then E(d )=μ1 − μ2 σ = 2 d 2 σ12 σ 2 n1 If values of + n2 2 σ12 and σ 2 were KNOWN, we could use a Z-test, Z= d -δ σd = (Y1 − Y2 ) − ( μ1 − μ 2 ) . σ 12 n1 If values of + σ 22 n2 2 σ12 and σ 2 were NOT KNOWN, and had to be estimated from the samples, we would use a t-test, t = d - δ = (Y1 − Y2 ) − ( μ1 − μ 2 ) . 2 2 Sd S1 S 2 + n1 n2 Since the hypothesized difference is usually 0 (zero), the term ( μ1 − μ2 ) is usually zero, and the equation is often simplified to t = d , Sd Et voila, a two sample t-test! This is a very common test, and it is the basis for many calculations used in regression and analysis of variance (ANOVA). It will crop up repeatedly as we progress in the course. It is very important! t= d - δ (Y1 − Y2 ) − ( μ1 − μ 2 ) , often written just t = d when δ or = Sd Sd S12 S 22 + n1 n2 μ1 − μ2 is equal to zero. The two-sample t-test Unfortunately, this is not the end of the story. It turns out that there is some ambiguity about the S12 S 22 + degrees of freedom for the error variance, . Is it n1–1, or n2–1, or somewhere in n1 n2 between, or maybe the sum? Power considerations POWER! We want the greatest possible power. It turns out that we get the greatest power (and our problems with degrees of freedom go away) if we can combine the two variance James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 102 estimates into one, single, new and improved estimate! But we can only do this if the two variances are not different. We can combine the two variance estimates into a single estimate if they are not too different. To determine if they are sufficiently similar we use an F test. Therefore, two-sample ttests START WITH AN F TEST! Pooling variances If the two estimates of variance are sufficiently similar, as judged by the F test of variances (e.g. 2 H0: σ12 = σ2 ), then they can be combined. This is called “pooling variances”, and is done as a weighted mean (or weighted average) of the variances. The weights are the degrees of freedom. Weighted means or averages n n i =1 i =1 The usual mean is calculated as Y = ∑ Yi n . The weighted mean is Y = ∑ wiYi n ∑ w , or i =1 i the sum of the variable multiplied by the weights divided by the sum of the weights. k 2 2 Pooled variances are calculated as Pooled S = SP = ∑γ S j =1 2 i i where j will be j = 1 and 2 k ∑γ j =1 i for groups 1 and 2. There could be more than 2 variances averaged in other situations. 2 Recall that γ j S j = SS j , so we can also calculate the sum of the corrected SS for each variable divided by the sum of the d.f. for each variable S 2 p γS =∑ j 2 j ∑γ =∑ SS j j ∑γ j γ 1S12 + γ 2 S22 SS1 + SS2 SS1 + SS2 = = Pooled variance calculation S = (n1 −1) + (n2 −1) γ1 + γ 2 γ1 + γ 2 2 p Two sample t-test variance estimates From linear combinations we know that the variance of the sum is the sum of the variances. S2 S2 2 2 This is the GENERAL CASE. 1 + 2 . But, if we test H0: σ1 = σ2 and fail to reject, we n1 n2 2 can pool the variances. The error variance is then S p 1 1 ( n + n ) . One additional minor 1 2 simplification is possible. If n1 = n2 = n, then we can place the pooled variance over a 2 2S p single n, . n So we now have a single, more powerful, pooled variance! What are it's degrees of freedom? The first variance had a d.f.= n1 – 1 = γ1 The second variance had d.f.= n2 – 1 = γ2 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 103 The pooled variance has a d.f equal to the sum of the d.f. for the variances that were 2 pooled, so the degrees of freedom is S p is (n1–1)+( n2–1) = γ1 + γ2 Summary: case where σ12 = σ 22 Test the variances to determine if they are significantly different. This is an F test of 2 H0: σ12 = σ2 . 2 If they are not different, then pool the variances into a single estimate of S p . The t-test is then done using this variance used to estimate the standard error of the difference. The d.f. are (n1–1) + (n2–1) The t-test equation is then t = d − δ (Y1 − Y2 ) − ( μ1 − μ 2 ) = Sd 1 1 2 Sp + n1 n2 ( ) One other detail; we are conducting the test with the condition that σ1 new assumption for this test, equal variances. 2 = σ 22 . This will be a Assumptions: NID r.v. (μ, σ2) N for Normality; the differences are normally distributed I for Independence; the observations and samples are independent Since the variance is specified to be a single variance equal to σ2, then the variances are equal or the variance is said to be homogeneous. The compliment to homogeneous variance is heterogeneous variance. Equal variance is also called homoscedasticity and the alternative referred to as heteroscedasticity. Samples characterized as having equal variance can also be referred to as homoscedastic or heteroscedastic. Case where σ12 ≠ σ 22 How do we conduct the test if the variances are not equal? On the one had, this is not a problem. The linear combination we used to get the variance does not require S2 S2 homogeneous variance, so we know the variance is 1 + 2 . n1 n2 But what are the degrees of freedom? It turns out the d.f. are somewhere between the smaller of n1–1 and n2–1 and the d.f. for the pooled variance estimate [(n1–1) + (n2–1)]. It would be conservative to just use the smaller of n1–1 and n2–1. This works and is reasonable and it is done. However, power is lost with this solution. (This solution is suggested by your textbook). The alternative is to estimate the d.f. using an approximation developed by Satterthwaite. This solution is used by SAS in the procedure PROC TTEST. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 104 2 Satterthwaite’s approximation d.f. = γ ≈ ⎡ ⎢ ⎢ ⎢ ⎣ 2 ⎡ S12 S2 ⎤ + ⎥ ⎢n n 2 ⎦ ⎣ 1 2 2 S1 S2 2 2⎤ n1 n2 ⎥ + ⎥ n1 − 1 n2 − 1 ⎥ ⎦ ( ) ( ) This calculation is, of course, an approximation as the name suggests. Note that it does not usually give nice integer degrees of freedom, expect some decimal places. This is not an issue for computer programs that can get P-values for any d.f. It does complicate using our tables a little. There is one additional “simplification”. We know that the d.f. are at least the smaller of n1–1 and n2–1. But what if n1 = n2 = n? In this case the d.f. will be at least n–1. However, Satterthwaite's approximation will still, usually, yield a larger d.f. Summary There are two cases in two-sample t-tests. The case where . σ12 = σ 22 and the case where σ12 ≠ σ 22 There are also some considerations for the cases where n1 = n2 and where n1 ≠ n2. Each of these cases alters the calculation of the standard error of the difference being tested and the degrees of freedom. Variance n1 ≠ n2 σ12 = σ 22 σ12 ≠ σ 22 1 1 (n + n ) S12 S 22 + n1 n2 2 Sp 1 n1 = n2 =n 2S 2 2 p n 2 S12 + S 2 n d.f. σ12 = σ 22 σ12 ≠ σ 22 n1 ≠ n2 (n1 – 1) + (n2 – 1) ≥ min[(n1 – 1), (n2 – 1)] n1 = n2 =n 2n – 2 ≥n–1 For our purposes, we will generally use SAS to conduct two-sample t-tests, and will let SAS determine Satterthwaite's approximation when the variances are not equal? How does SAS know if the variances are equal? How does it know what value of α you want to use? Good questions. Actually, SAS does not know or assume anything. We'll find out what it does later. One last thought on testing for differences between two populations. The test we have been primarily discussing is the t test, a test of equality of means. However, if we find in the process of checking variance that the variances differ, then there are already some differences between the two populations that may be of interest. James P. Geaghan Copyright 2010 ...
View Full Document

Ask a homework question - tutors are online