This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Methods I (EXST 7005) Page 90 Summary
1) χ2/γ with γ d.f. = F with γ, ∞ d.f.
2) F = χ12 2
χ2 γ1 γ 2 with γ1, γ2 d.f. if Ho is true 3) t with γ = ∞ follows a Z distribution
4) Z2 = F with 1, ∞ d.f.
5) t2 with γ d.f. = F with 1, γ d.f.
Some Examples in the F tables (all α = 0.05, two tails for Z and t values since sign is lost in
squaring)
F with 1, 10 d.f. = 4.96 = t2 with 10 d.f. = (2.228)2 = 4.96
F with 1, ∞ d.f. = 3.84 = Z2 = (1.96)2 = 3.84
F with 10, ∞ d.f. = 1.83 = χ2 / γ = 18.3 / 10 = 1.83 with 10 d.f. Confidence intervals and margin of error
The confidence interval is an expression of what we believe to be a range of values that is likely to
contain the true value of some parameter is called a confidence interval. The width of this
interval above and below the parameter estimate is called the margin of error.
We can calculate confidence intervals for means (μ) and variances (σ2). Confidence intervals for t and Z distributions
t and Z distribution confidence intervals start with a t or Z probability statement. P(−ta ≤ t ≤ ta ) = 1 − α can also be written
2 P(−ta ≤
2 2 Y −μ
≤ ta ) = 1 − α
2
SY which is modified to express an interval about μ instead of t (or Z). P(−ta SY ≤ Y − μ ≤ ta SY ) = 1 − α
2 2 P(−Y + ta SY ≥ −μ ≥ −Y − ta SY ) = 1 − α
2 2 The final form is given below. P(Y − ta SY ≤ μ ≤ Y + ta SY ) = 1 − α
2 2 The expression for Z has an identical derivation. P(Y − Z a σ Y ≤ μ ≤ Y + Z a σ Y ) = 1 − α
2 2 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 91 A common short notation for the interval in the probability statement is given as Y ± ta SY , but
2 the probability statement is preferable as a final result. The value ta SY for intervals on
2 means and ta S for intervals on individual observations is half of the interval width from
2 the lower limit to the upper limit and is called the margin of error. Confidence intervals for variance
Variances follow a Chi square distribution. The confidence interval for variance is based on the
Chi Square distribution.
2
2
P(χlower ≤ χ 2 ≤ χupper ) = 1−α or
2
P( χlower ≤ SS σ 2 2
≤ χupper ) = 1 − α which is solved to isolate σ2.
P (χ 1 ≥ P (χ 1 ≤ 2
lower 2
upper σ2
SS σ2
SS ≥ 1 χ ≤ 2
upper 1 χ 2
lower ) = 1−α ) = 1−α giving the expression, ( χSS P 2
upper ≤σ2 ≤ SS
2
χlower ) = 1−α Notice that the upper tabular Chi square value comes out in the lower bound and the lower Chi
square in the upper bound. ( χSS P 2
upper ≤σ2 ≤ SS
2
χlower ) = 1−α Notes on confidence intervals
One sided intervals are possible, but uncommon.
Confidence intervals are one of the most common expressions in statistics, frequently occurring in
publications.
Margins of error and confidence intervals are not always calculated in statistical software
programs, but they can easily be done by hand. From the previous SAS Example 2c
We receive a shipment of apples that are supposed to be “premium apples”, with a diameter of at
least 2.5 inches. We will take a sample of 12 apples, and place a confidence interval on the
mean. The sample values for the 12 apples are;
2.9, 2.1, 2.4, 2.8, 3.1, 2.8, 2.7, 3.0, 2.4, 3.2, 2.3, 3.4 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 92 Do we want the Std dev or Std error?
SAS PROC UNIVARIATE Output
The UNIVARIATE Procedure
Variable: diam (Diameter of the apple)
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation 12
2.75833333
0.39418116
–0.1184219
93.01
14.2905557 Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean 12
33.1
0.15537879
–0.8352969
1.70916667
0.1137903 The standard deviation is the variation in individual apples. If we wanted the interval that
contained 95% of the apples, we would use the standard deviation. However, we have
estimated a mean and we want to place a confidence interval that expresses our knowledge of
this estimate. Is our estimate of the mean good or poor? Is the confidence interval narrow or
wide?
Note the confidence interval about the mean, and about individual observations.
For means: P(Y − ta SY ≤ μ ≤ Y + ta SY ) = 1 − α
2 2 The margin of error for the mean is ta SY
2 For individual observations: P(Y − ta S ≤ μ ≤ Y + ta S ) = 1 − α
2 2 So we need the mean of the apples and the standard error.
Mean = 2.758333
Std Error Mean = 0.11379 (no adjustment needed)
We also need a t–value. With 12 apples and 11 d.f., our two tailed t–value is 2.201. So
Y ± ta SY or 2.758 ± (2.201)(0.1138) = 2.758 ± (0.250) gives the interval. The margin of
2 error is 0.250 and the best expression is as a confidence interval probability statement.
P(2.758 – 0.250 ≤ μ ≤ 2.758 + 0.250) = 1 – α
P(2.508 ≤ μ ≤ 3.008) = 0.95
The real value of μ may or may not be in this interval, but it is our best evaluation of where the
true value of μ will be.
For individual observations the calculation uses the standard deviation instead of the standard
error, P(Y − ta S ≤ μ ≤ Y + ta S ) = 1 − α . For the apples the standard deviation was 0.3942
2 2 and all other values remain the same. Y ± ta S or 2.758 ± (2.201)( 0.3942) = 2.758 ± (0.8676) gives the interval. The best expression
2 is as a confidence interval probability statement.
P(2.758 – 0.8676 ≤ μ ≤ 2.758 + 0.8676) = 1 – α
P(1.8904 ≤ μ ≤ 3.6256) = 0.95
James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 93 Example 2  Variance CI
Place a confidence interval on the variance estimate for the apple example. The variance estimate
from the SAS output is S2 = 0.155379 and the corrected sum of squares is 1.709. The Chi
square values for 11 d.f. are 3.816 (lower) and 21.92 (upper). ( χSS Recall, P 2
upper ≤σ2 ≤ (1.709 ≤ σ
21.92 SS
2
χlower ) = 1−α ) ( ) 1.709
= 0.95 and P 0.078 ≤ σ 2 ≤ 0.448 = 0.95 ,
3.816
for a variance of S2 = 0.155379 and a corrected SS of 1.709. Then P 2 ≤ A note on hypothesis testing
Hypothesis tests can be done by calculating a confidence interval for the appropriate value of α
and checking to see if the hypothesized value is contained in the interval. This approach is
used in some SAS program output such as Analysis of Variance. Summary
Confidence intervals for μ (t or Z distribution) and σ2 (Chi square).
For μ (using either the t or Z distribution). P(Y − ta SY ≤ μ ≤ Y + ta SY ) = 1 − α
2 2 Where the margin of error for the mean is ta SY or Za SY for Z distribution applications.
2 2 For σ (Chi square distribution).
2 ( χSS P 2
upper ≤σ2 ≤ SS
2
χlower ) = 1−α These are common and IMPORTANT calculations.
The margin of error is the amount added and subtracted from the mean to get the confidence
interval. It is equal to ta SY .
2 They are not always calculated by statistical software.
Checking to see if a value falls in the interval is equivalent to perform statistical tests of
hypothesis against that value. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 94 Linear combinations
Generic Example: We want to create a score we can use to evaluate students applying to LSU as
freshmen. ( ) ( ) ( Scorei = a VerbalSATi + b MathSATi + c GPAi ) where; a, b and c are constants and
VerbalSAT, MathSAT and GPA are variables that vary among students.
We need to choose values of a, b and c in order to calculate the score. The average student has
values of VerbalSAT=500, MathSAT=500 and GPA=2.
If we choose a=1/3 and b=1/3 and c=1/3 then we have an average of the 3 variables,
(a+b+c)/3. The score for the “average” student would be 334.1633. If we choose a=1
and b=1 and c=1 we have a simple sum of the variables, (aV+bM+cG). For the
average student this would be 1002.5. Both of these choices produce a score that
could be used to compare among students. However, neither of these two choices
produces a score that resembles the more familiar scores for standardized tests or
grade point averages.
But VerbalSAT and MathSAT are values in the hundreds (range: 200 to 800) and the GPA
is single digits (range: 0 to 4). If we wanted to scale our scores to more closely
resemble the mean of these scores we could modify the coefficients for means (a=1/3
and b=1/3 and c=1/3) by either dividing the standardized tests by an additional 200
points so their maximum would be 4 (a=1/600 and b=1/600 and c=1/3) producing a
mean for the average student of 2.5 or we could scale the GPA by multiplying by 200
to produce a maximum of 800 (a=1/3 and b=1/3 and c=200/3) yielding a score of 500
for the average student.
Any of these choices produce a reasonable and acceptable linear combination. Which one
is used depends on the objectives and preferences of the user. We will look at several
specific applications. Mean and Variance of a linear combination
So what is the mean value of our linear combination, and can we put a variance on it (to get a
confidence interval)? ( ) ( ) ( ) 3 Scorei = a1 VerbalSATi + a2 MathSATi + a3 GPAi = ∑ aiYi
Linear combination: i =1 ∑a Y i i ∑a μ
Estimate of mean: ∑ a Y
Expected value: i Yi i i Variance of the linear combination.
The variance of a linear combination is the sum of the individual variances (with squared
coefficients) plus twice the covariance of the variables (with both coefficients).
Estimate of the variance: James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 95 ∑ a S + ∑∑ 2a a S
2 2
i i i j ij for i ≠ j For example, with the Score we calculated previously, the variance might be
VAR ( Score ) = a 2VAR (Verbal ) + b 2VAR ( Math ) + c 2VAR ( GPA) +
2abCov (Verbal , Math ) + 2acCov (Verbal , GPA ) + 2bcCov ( Math, GPA ) HOWEVER, if the variables are independent the covariance can be assumed to be
zero. The linear combination reduces to the sum of the variances of the individual
variables (with squared coefficients). VAR ( Score) = a2VAR (Verbal ) + b2VAR ( Math) + c2VAR ( GPA) Utility of linear combinations
As the course progresses we will see applications of linear combinations to almost everything.
Two sample ttest: H 0 : μ1 − μ 2 = δ , estimated by Y1 − Y2 = δ . This is a linear combination!
The most common case is a test of the linear combination1Y1 + ( − 1)Y 2 = δ , but any other
combination is possible (e.g. 0.8Y1 − 1Y 2 = 0 ) and we only need calculate a variance and
standard error for the given situation. Var ( 0.8Y1 −1Y2 ) = ( 0.8) SY2 + ( −1) SY2 = 0.64SY2 + SY2 .
2 2 1 Regression: The model, 2 1 2 Yi = b0 + b1Xi + ei is a linear combination. Analysis of variance: We will look at contrasts to test for differences between means similar to
a two sample ttest (but usually with more than two means). For example, testing the
hypothesis that some mean is equal to the average of two other means would be done as
⎛ μ + μ3 ⎞
⎛ −1 ⎞
⎛ −1 ⎞
H 0 : μ1 − ⎜ 2
⎟ = 1μ1 + ⎜ ⎟ μ 2 + ⎜ ⎟ μ 3 = 0
⎝ 2 ⎠ ⎝ 2 ⎠ ⎝ 2 ⎠ An application: Stratified Random Sampling
Suppose we want to estimate the number of ducks in an area on the Louisiana coast. The area of
interest is 300 acres. We fly 9 transects, counting ducks for 1/10 mile on either side of the
plane, and from each transect we estimate the number of ducks per acre. We can then
calculate an estimate and a confidence interval for that estimate. Raw data
Sample Number
1
2
3
4
5
6
7
8
9 Ducks counted
8
19
30
23
56
89
2024
1732
1122 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 96 Summary statistics
Statistic Value
n=9
sum = 5103
mean = 567
var = 684349.25
std dev = 827.25
std error = 275.75
tvalue = 2.306
acres = 300 Estimate and confidence interval for the (300 acres)
Given this value as the estimate of the true mean number of ducks (μ), we can state our results
as a probability statement. The usual calculation is
“Some parameter estimate ± tα/2 * standard error”
The confidence interval for ducks per acre is Y ± ta SY or 567± 2.306 ( 275.75) and
2 567 ± 635.88 . However, we want ducks per 300 acres. Recall from our discussion of
transformations that when the mean is multiplied by 300 to get total ducks on the 300
acres, the variance would be multiplied by 300 squared and the standard deviation
multiplied by 300.
So the calculation is ( 300) 567± 2.306 ( 300)( 275.75) , so the margin of error is 2.306 ( 300)( 275.75) = 190765 and the interval is 170100 ± 190765
The probability statement is P(Y − ta SY ≤ μducks ≤ Y + ta SY ) = 1 − α
2 2 P ( –20665 ≤ μ ≤ 360865) = 0.95
This calculation is for the number of ducks is the value of interest and the 300 acres the area of
interest. Stratification
Now let’s suppose we noticed that the duck species we were studying and counting were primarily
a fresh water species, and occurred only infrequently in the brackish and saline zones of the
300 acres of interest. We could modify the study and examine the numbers in the 3 zones
separately. The advantage is that perhaps we can get 3 separate estimates of homogeneous
habitat with a smaller variance than one estimate of the heterogeneous whole 300 acres. We
will allow that each zone has been determined to encompass 100 acres to simplify our
analysis.
The 3 habitat types would be called “strata”, and we could estimate the number for each stratum
separately and, we assume, independently so we need not consider covariance. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 97 The samples were done such that there are 3 samples in each stratum.
Obs
1
2
3
n=
sum =
mean =
var =
std dev =
std error = Saline (CH2O)
8
19
30
3
57
19
121
11
6.35 Brackish (BH2O)
23
56
89
3
168
56
1089
33
19.05 Fresh (FH2O)
2024
1732
1122
3
4878
1626
211828
460.248
265.72 YTotal Ducks = 100YFH2O + 100YBH2O + YCH2O
Total Ducks = 100 (1626) +100 ( 56) + 100 (19) = 1900 + 5600 + 162600 = 170100
The estimated total numbers were the same since each the area of each zone was the same.
Now we need a variance to calculate a confidence interval.
2
2
2
2
STotalDucks = a2 SFH2O + b2 SBH2O + c2 SCH2O
2
SYTotalDucks = (100) 211828 + (100) 1089 + (100) 121= 2130380000
2 SYTotalDucks = 2 2 2130380000 = 15385.347 2
2
2
2
STotalDucks = a2 SFH2O + b2 SBH2O + c2 SCH2O Linear combination of independent means.
tvalue
acres
estimated total
variance
std error 2.306
300
170100
2,130,380,000
15385.347 margin of error
CLlower
CLupper 35478.70
134621.30
205578.70 Old and new estimates.
P(–20665 ≤ Total Ducks ≤ 360865) = 0.95
P(134621 ≤ Total Ducks ≤ 205579) = 0.95
Why does stratification give a smaller interval width? Because we replaced one sample
with a very large variance (684349) with 3 samples, each with a smaller variance (121,
1089 and 211828). This reduced the margin of error from 190,765 to 35,478.
The heterogeneous variances should not have been pooled in the first place. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 98 You will recall that we mentioned that one way of increasing power is to reduce the variance. We
did not dwell on this previously because the only mechanism I could suggest at the time for
reducing variance was “improving measurement error”. Now we have another method of
reducing variance, stratification. This involves sampling smaller homogeneous units instead
of one large heterogeneous unit.
Is the fact that the 3 variances were not similar a problem? No, nowhere in working with linear
combinations did we state that the variances had to be similar. Later we will find that this can
be advantageous, but it is not necessary for this type of analysis. It is advantageous in
analyses where we may want to pool variances to a single, improved estimate. Summary
The use of “Linear Combinations” is a rather generic technique with many applications
throughout statistics. We will see them again in the two sample ttest, regression, and
ANOVA. Sampling is another example of the application.
It is important to determine if the variables in the linear combination are independent or not. If
they are, the covariance values can be considered to be zero.
The linear combination and its variance is calculated as
k Linear combination = ∑ aiYi
i =1 Variance = ∑a
i 2
i S i2 + 2 ∑ ∑ ai a j S ij for i ≠ j
i j Where, if we can assume the variables are independent the covariances can be assumed to be zero.
This will be very important later. We will assume independence in ttest and ANOVA, and
parts of regression, but not all of regression! A final note on linear combinations
We have been assuming “independence” for a while. We sample at random to obtain
independence, and to get a good representative sample. But do “linear combinations” have
anything to do with the simpler calculations we talked about earlier when we assumed
independence, say a test of the mean against an hypothesized value?
I'm glad you asked. As a matter of fact the mean is calculated as Y = ( Y1 + Y2 + Y3 + ... + Yn )
n and that is a linear combination. Fortunately for us we do not have to consider the
covariances of individual observations because they are sampled independently! James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 99 Twosample ttests
Recall the derived population of sample means.
Imagine you had two of these derived populations, as we had for variances when we developed the
Ftest.
Equality of Population Means
Population 1
μ1 Mean
Variance σ12 Population 2
μ2
2
σ2 Given two populations, test for equality. Actually test to see if the difference between two means
is equal to some value, usually zero. H0: μ1 − μ2 = δ where often δ = 0 Derived populations
From population 1 draw all possible samples of size n1 to get Y .
1
• Mean μ1 yields μ1 = μY1 • Variance • Derived population size is N1 1 2
σ12 yields σ12 = σ Y2 = σ1 n
1
1 n Likewise population 2 gives Y .
2
• Mean μ2 yields μ2 = μY2 • Variance • Derived population size is N 2 2 2
2
σ 2 yields σ 22 = σ Y2 = σ 2 n
2
2 n Draw a sample from each population
Population 1 Population 2 Sample size n1 n2 d.f. γ1 γ2 Sample mean Y
1 Y2 Sample variance S12 2
S2 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) ( Page 100 ) Take all possible differences Y1 − Y2 = d . That is, all possible means from the first population (
n
N1n1 means) and all possible means from the second population ( N 2 2 ), and get the difference
n n between each pair. There are a total of ( N1 1 )( N 2 2 ) values. (Y −Y ) = d
1i 2j k n n n n for i = 1,..., N1 1 , j = 1,..., N 2 2 and k = 1,...,( N1 1 )( N 2 2 ) ( ) The population of differences Y i −Y2 j = dk has a mean equal to
1
equal to μd or μY1−Y2 , a variance 2
2
σ d or σ Y1 −Y and a standard error equal to σ d or σ Y1 −Y (the standard deviation of the mean difference). Characteristics of the derived population
1) As n1 and n2 increase, the distribution of d approaches a normal distribution. If the two
original populations are normal, the distribution of d is normal regardless of n1 and n2.
2) The mean of the differences ( dk ) is equal to the difference between the means of the two parent
populations. μd = μY1 − μY2 = μY1 −Y2 = δ
3) The variance of the differences ( dk ) is equal to the variance for the difference between the
means of the two parent populations.
2
σ d = σ Y2 −Y =
1 2 σ d = σ Y −Y =
1 2 σ 12
n1 + σ 12
n1 2
σ2 n2
+ 2
σ2 n2 This is a variance for a linear combination. The two samples are assumed independent, so
covariance considerations are not needed.
The variance comes from linear combinations. The variance of the sums is the sum of the
variance (no covariance if independent).
Linear combination: Y1 − Y2 Y
The coefficients are 1, –1 (i.e. 1 1 + (−1)Y2 )
ˆ2
ˆ2
The variance for this linear combination is 12 σ Y1 + (−1)2 σ Y2 = ˆ
ˆ2
σ 12 σ 2 +
if the two variables are
n1 n2
independent. Since they are each separately sampled at random this is an easy assumption. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 101 The twosample ttest
H 0 : μ1 − μ 2 = δ
H 1: μ1 − μ 2 ≠ δ ,
a nondirectional alternative (one tailed test) would specify a difference, either >δ or < δ.
Commonly, δ is 0 (zero)
If H0 is true, then E(d )=μ1 − μ2
σ =
2
d 2
σ12 σ 2 n1 If values of + n2 2
σ12 and σ 2 were KNOWN, we could use a Ztest, Z= d δ σd = (Y1 − Y2 ) − ( μ1 − μ 2 ) . σ 12
n1 If values of + σ 22
n2 2
σ12 and σ 2 were NOT KNOWN, and had to be estimated from the samples, we would use a ttest, t = d  δ = (Y1 − Y2 ) − ( μ1 − μ 2 ) .
2
2
Sd S1 S 2
+
n1 n2 Since the hypothesized difference is usually 0 (zero), the term ( μ1 − μ2 ) is usually zero, and the
equation is often simplified to t = d
,
Sd Et voila, a two sample ttest!
This is a very common test, and it is the basis for many calculations used in regression and
analysis of variance (ANOVA). It will crop up repeatedly as we progress in the course. It is
very important!
t= d  δ (Y1 − Y2 ) − ( μ1 − μ 2 ) , often written just t = d when δ or
=
Sd
Sd
S12 S 22
+
n1 n2 μ1 − μ2 is equal to zero. The twosample ttest
Unfortunately, this is not the end of the story. It turns out that there is some ambiguity about the
S12 S 22
+
degrees of freedom for the error variance,
. Is it n1–1, or n2–1, or somewhere in
n1 n2
between, or maybe the sum? Power considerations
POWER! We want the greatest possible power. It turns out that we get the greatest power
(and our problems with degrees of freedom go away) if we can combine the two variance
James P. Geaghan Copyright 2010 ...
View
Full
Document
This note was uploaded on 12/29/2011 for the course EXST 7005 taught by Professor Geaghan,j during the Fall '08 term at LSU.
 Fall '08
 Geaghan,J

Click to edit the document details