EXST7005 Fall2010 14a Confidence and Linear Combos

EXST7005 Fall2010 14a Confidence and Linear Combos -...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Methods I (EXST 7005) Page 90 Summary 1) χ2/γ with γ d.f. = F with γ, ∞ d.f. 2) F = χ12 2 χ2 γ1 γ 2 with γ1, γ2 d.f. if Ho is true 3) t with γ = ∞ follows a Z distribution 4) Z2 = F with 1, ∞ d.f. 5) t2 with γ d.f. = F with 1, γ d.f. Some Examples in the F tables (all α = 0.05, two tails for Z and t values since sign is lost in squaring) F with 1, 10 d.f. = 4.96 = t2 with 10 d.f. = (2.228)2 = 4.96 F with 1, ∞ d.f. = 3.84 = Z2 = (1.96)2 = 3.84 F with 10, ∞ d.f. = 1.83 = χ2 / γ = 18.3 / 10 = 1.83 with 10 d.f. Confidence intervals and margin of error The confidence interval is an expression of what we believe to be a range of values that is likely to contain the true value of some parameter is called a confidence interval. The width of this interval above and below the parameter estimate is called the margin of error. We can calculate confidence intervals for means (μ) and variances (σ2). Confidence intervals for t and Z distributions t and Z distribution confidence intervals start with a t or Z probability statement. P(−ta ≤ t ≤ ta ) = 1 − α can also be written 2 P(−ta ≤ 2 2 Y −μ ≤ ta ) = 1 − α 2 SY which is modified to express an interval about μ instead of t (or Z). P(−ta SY ≤ Y − μ ≤ ta SY ) = 1 − α 2 2 P(−Y + ta SY ≥ −μ ≥ −Y − ta SY ) = 1 − α 2 2 The final form is given below. P(Y − ta SY ≤ μ ≤ Y + ta SY ) = 1 − α 2 2 The expression for Z has an identical derivation. P(Y − Z a σ Y ≤ μ ≤ Y + Z a σ Y ) = 1 − α 2 2 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 91 A common short notation for the interval in the probability statement is given as Y ± ta SY , but 2 the probability statement is preferable as a final result. The value ta SY for intervals on 2 means and ta S for intervals on individual observations is half of the interval width from 2 the lower limit to the upper limit and is called the margin of error. Confidence intervals for variance Variances follow a Chi square distribution. The confidence interval for variance is based on the Chi Square distribution. 2 2 P(χlower ≤ χ 2 ≤ χupper ) = 1−α or 2 P( χlower ≤ SS σ 2 2 ≤ χupper ) = 1 − α which is solved to isolate σ2. P (χ 1 ≥ P (χ 1 ≤ 2 lower 2 upper σ2 SS σ2 SS ≥ 1 χ ≤ 2 upper 1 χ 2 lower ) = 1−α ) = 1−α giving the expression, ( χSS P 2 upper ≤σ2 ≤ SS 2 χlower ) = 1−α Notice that the upper tabular Chi square value comes out in the lower bound and the lower Chi square in the upper bound. ( χSS P 2 upper ≤σ2 ≤ SS 2 χlower ) = 1−α Notes on confidence intervals One sided intervals are possible, but uncommon. Confidence intervals are one of the most common expressions in statistics, frequently occurring in publications. Margins of error and confidence intervals are not always calculated in statistical software programs, but they can easily be done by hand. From the previous SAS Example 2c We receive a shipment of apples that are supposed to be “premium apples”, with a diameter of at least 2.5 inches. We will take a sample of 12 apples, and place a confidence interval on the mean. The sample values for the 12 apples are; 2.9, 2.1, 2.4, 2.8, 3.1, 2.8, 2.7, 3.0, 2.4, 3.2, 2.3, 3.4 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 92 Do we want the Std dev or Std error? SAS PROC UNIVARIATE Output The UNIVARIATE Procedure Variable: diam (Diameter of the apple) Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 12 2.75833333 0.39418116 –0.1184219 93.01 14.2905557 Sum Weights Sum Observations Variance Kurtosis Corrected SS Std Error Mean 12 33.1 0.15537879 –0.8352969 1.70916667 0.1137903 The standard deviation is the variation in individual apples. If we wanted the interval that contained 95% of the apples, we would use the standard deviation. However, we have estimated a mean and we want to place a confidence interval that expresses our knowledge of this estimate. Is our estimate of the mean good or poor? Is the confidence interval narrow or wide? Note the confidence interval about the mean, and about individual observations. For means: P(Y − ta SY ≤ μ ≤ Y + ta SY ) = 1 − α 2 2 The margin of error for the mean is ta SY 2 For individual observations: P(Y − ta S ≤ μ ≤ Y + ta S ) = 1 − α 2 2 So we need the mean of the apples and the standard error. Mean = 2.758333 Std Error Mean = 0.11379 (no adjustment needed) We also need a t–value. With 12 apples and 11 d.f., our two tailed t–value is 2.201. So Y ± ta SY or 2.758 ± (2.201)(0.1138) = 2.758 ± (0.250) gives the interval. The margin of 2 error is 0.250 and the best expression is as a confidence interval probability statement. P(2.758 – 0.250 ≤ μ ≤ 2.758 + 0.250) = 1 – α P(2.508 ≤ μ ≤ 3.008) = 0.95 The real value of μ may or may not be in this interval, but it is our best evaluation of where the true value of μ will be. For individual observations the calculation uses the standard deviation instead of the standard error, P(Y − ta S ≤ μ ≤ Y + ta S ) = 1 − α . For the apples the standard deviation was 0.3942 2 2 and all other values remain the same. Y ± ta S or 2.758 ± (2.201)( 0.3942) = 2.758 ± (0.8676) gives the interval. The best expression 2 is as a confidence interval probability statement. P(2.758 – 0.8676 ≤ μ ≤ 2.758 + 0.8676) = 1 – α P(1.8904 ≤ μ ≤ 3.6256) = 0.95 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 93 Example 2 - Variance CI Place a confidence interval on the variance estimate for the apple example. The variance estimate from the SAS output is S2 = 0.155379 and the corrected sum of squares is 1.709. The Chi square values for 11 d.f. are 3.816 (lower) and 21.92 (upper). ( χSS Recall, P 2 upper ≤σ2 ≤ (1.709 ≤ σ 21.92 SS 2 χlower ) = 1−α ) ( ) 1.709 = 0.95 and P 0.078 ≤ σ 2 ≤ 0.448 = 0.95 , 3.816 for a variance of S2 = 0.155379 and a corrected SS of 1.709. Then P 2 ≤ A note on hypothesis testing Hypothesis tests can be done by calculating a confidence interval for the appropriate value of α and checking to see if the hypothesized value is contained in the interval. This approach is used in some SAS program output such as Analysis of Variance. Summary Confidence intervals for μ (t or Z distribution) and σ2 (Chi square). For μ (using either the t or Z distribution). P(Y − ta SY ≤ μ ≤ Y + ta SY ) = 1 − α 2 2 Where the margin of error for the mean is ta SY or Za SY for Z distribution applications. 2 2 For σ (Chi square distribution). 2 ( χSS P 2 upper ≤σ2 ≤ SS 2 χlower ) = 1−α These are common and IMPORTANT calculations. The margin of error is the amount added and subtracted from the mean to get the confidence interval. It is equal to ta SY . 2 They are not always calculated by statistical software. Checking to see if a value falls in the interval is equivalent to perform statistical tests of hypothesis against that value. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 94 Linear combinations Generic Example: We want to create a score we can use to evaluate students applying to LSU as freshmen. ( ) ( ) ( Scorei = a VerbalSATi + b MathSATi + c GPAi ) where; a, b and c are constants and VerbalSAT, MathSAT and GPA are variables that vary among students. We need to choose values of a, b and c in order to calculate the score. The average student has values of VerbalSAT=500, MathSAT=500 and GPA=2. If we choose a=1/3 and b=1/3 and c=1/3 then we have an average of the 3 variables, (a+b+c)/3. The score for the “average” student would be 334.1633. If we choose a=1 and b=1 and c=1 we have a simple sum of the variables, (aV+bM+cG). For the average student this would be 1002.5. Both of these choices produce a score that could be used to compare among students. However, neither of these two choices produces a score that resembles the more familiar scores for standardized tests or grade point averages. But VerbalSAT and MathSAT are values in the hundreds (range: 200 to 800) and the GPA is single digits (range: 0 to 4). If we wanted to scale our scores to more closely resemble the mean of these scores we could modify the coefficients for means (a=1/3 and b=1/3 and c=1/3) by either dividing the standardized tests by an additional 200 points so their maximum would be 4 (a=1/600 and b=1/600 and c=1/3) producing a mean for the average student of 2.5 or we could scale the GPA by multiplying by 200 to produce a maximum of 800 (a=1/3 and b=1/3 and c=200/3) yielding a score of 500 for the average student. Any of these choices produce a reasonable and acceptable linear combination. Which one is used depends on the objectives and preferences of the user. We will look at several specific applications. Mean and Variance of a linear combination So what is the mean value of our linear combination, and can we put a variance on it (to get a confidence interval)? ( ) ( ) ( ) 3 Scorei = a1 VerbalSATi + a2 MathSATi + a3 GPAi = ∑ aiYi Linear combination: i =1 ∑a Y i i ∑a μ Estimate of mean: ∑ a Y Expected value: i Yi i i Variance of the linear combination. The variance of a linear combination is the sum of the individual variances (with squared coefficients) plus twice the covariance of the variables (with both coefficients). Estimate of the variance: James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 95 ∑ a S + ∑∑ 2a a S 2 2 i i i j ij for i ≠ j For example, with the Score we calculated previously, the variance might be VAR ( Score ) = a 2VAR (Verbal ) + b 2VAR ( Math ) + c 2VAR ( GPA) + 2abCov (Verbal , Math ) + 2acCov (Verbal , GPA ) + 2bcCov ( Math, GPA ) HOWEVER, if the variables are independent the covariance can be assumed to be zero. The linear combination reduces to the sum of the variances of the individual variables (with squared coefficients). VAR ( Score) = a2VAR (Verbal ) + b2VAR ( Math) + c2VAR ( GPA) Utility of linear combinations As the course progresses we will see applications of linear combinations to almost everything. Two sample t-test: H 0 : μ1 − μ 2 = δ , estimated by Y1 − Y2 = δ . This is a linear combination! The most common case is a test of the linear combination1Y1 + ( − 1)Y 2 = δ , but any other combination is possible (e.g. 0.8Y1 − 1Y 2 = 0 ) and we only need calculate a variance and standard error for the given situation. Var ( 0.8Y1 −1Y2 ) = ( 0.8) SY2 + ( −1) SY2 = 0.64SY2 + SY2 . 2 2 1 Regression: The model, 2 1 2 Yi = b0 + b1Xi + ei is a linear combination. Analysis of variance: We will look at contrasts to test for differences between means similar to a two sample t-test (but usually with more than two means). For example, testing the hypothesis that some mean is equal to the average of two other means would be done as ⎛ μ + μ3 ⎞ ⎛ −1 ⎞ ⎛ −1 ⎞ H 0 : μ1 − ⎜ 2 ⎟ = 1μ1 + ⎜ ⎟ μ 2 + ⎜ ⎟ μ 3 = 0 ⎝ 2 ⎠ ⎝ 2 ⎠ ⎝ 2 ⎠ An application: Stratified Random Sampling Suppose we want to estimate the number of ducks in an area on the Louisiana coast. The area of interest is 300 acres. We fly 9 transects, counting ducks for 1/10 mile on either side of the plane, and from each transect we estimate the number of ducks per acre. We can then calculate an estimate and a confidence interval for that estimate. Raw data Sample Number 1 2 3 4 5 6 7 8 9 Ducks counted 8 19 30 23 56 89 2024 1732 1122 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 96 Summary statistics Statistic Value n=9 sum = 5103 mean = 567 var = 684349.25 std dev = 827.25 std error = 275.75 t-value = 2.306 acres = 300 Estimate and confidence interval for the (300 acres) Given this value as the estimate of the true mean number of ducks (μ), we can state our results as a probability statement. The usual calculation is “Some parameter estimate ± tα/2 * standard error” The confidence interval for ducks per acre is Y ± ta SY or 567± 2.306 ( 275.75) and 2 567 ± 635.88 . However, we want ducks per 300 acres. Recall from our discussion of transformations that when the mean is multiplied by 300 to get total ducks on the 300 acres, the variance would be multiplied by 300 squared and the standard deviation multiplied by 300. So the calculation is ( 300) 567± 2.306 ( 300)( 275.75) , so the margin of error is 2.306 ( 300)( 275.75) = 190765 and the interval is 170100 ± 190765 The probability statement is P(Y − ta SY ≤ μducks ≤ Y + ta SY ) = 1 − α 2 2 P ( –20665 ≤ μ ≤ 360865) = 0.95 This calculation is for the number of ducks is the value of interest and the 300 acres the area of interest. Stratification Now let’s suppose we noticed that the duck species we were studying and counting were primarily a fresh water species, and occurred only infrequently in the brackish and saline zones of the 300 acres of interest. We could modify the study and examine the numbers in the 3 zones separately. The advantage is that perhaps we can get 3 separate estimates of homogeneous habitat with a smaller variance than one estimate of the heterogeneous whole 300 acres. We will allow that each zone has been determined to encompass 100 acres to simplify our analysis. The 3 habitat types would be called “strata”, and we could estimate the number for each stratum separately and, we assume, independently so we need not consider covariance. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 97 The samples were done such that there are 3 samples in each stratum. Obs 1 2 3 n= sum = mean = var = std dev = std error = Saline (CH2O) 8 19 30 3 57 19 121 11 6.35 Brackish (BH2O) 23 56 89 3 168 56 1089 33 19.05 Fresh (FH2O) 2024 1732 1122 3 4878 1626 211828 460.248 265.72 YTotal Ducks = 100YFH2O + 100YBH2O + YCH2O Total Ducks = 100 (1626) +100 ( 56) + 100 (19) = 1900 + 5600 + 162600 = 170100 The estimated total numbers were the same since each the area of each zone was the same. Now we need a variance to calculate a confidence interval. 2 2 2 2 STotalDucks = a2 SFH2O + b2 SBH2O + c2 SCH2O 2 SYTotalDucks = (100) 211828 + (100) 1089 + (100) 121= 2130380000 2 SYTotalDucks = 2 2 2130380000 = 15385.347 2 2 2 2 STotalDucks = a2 SFH2O + b2 SBH2O + c2 SCH2O Linear combination of independent means. t-value acres estimated total variance std error 2.306 300 170100 2,130,380,000 15385.347 margin of error CL-lower CL-upper 35478.70 134621.30 205578.70 Old and new estimates. P(–20665 ≤ Total Ducks ≤ 360865) = 0.95 P(134621 ≤ Total Ducks ≤ 205579) = 0.95 Why does stratification give a smaller interval width? Because we replaced one sample with a very large variance (684349) with 3 samples, each with a smaller variance (121, 1089 and 211828). This reduced the margin of error from 190,765 to 35,478. The heterogeneous variances should not have been pooled in the first place. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 98 You will recall that we mentioned that one way of increasing power is to reduce the variance. We did not dwell on this previously because the only mechanism I could suggest at the time for reducing variance was “improving measurement error”. Now we have another method of reducing variance, stratification. This involves sampling smaller homogeneous units instead of one large heterogeneous unit. Is the fact that the 3 variances were not similar a problem? No, nowhere in working with linear combinations did we state that the variances had to be similar. Later we will find that this can be advantageous, but it is not necessary for this type of analysis. It is advantageous in analyses where we may want to pool variances to a single, improved estimate. Summary The use of “Linear Combinations” is a rather generic technique with many applications throughout statistics. We will see them again in the two sample t-test, regression, and ANOVA. Sampling is another example of the application. It is important to determine if the variables in the linear combination are independent or not. If they are, the covariance values can be considered to be zero. The linear combination and its variance is calculated as k Linear combination = ∑ aiYi i =1 Variance = ∑a i 2 i S i2 + 2 ∑ ∑ ai a j S ij for i ≠ j i j Where, if we can assume the variables are independent the covariances can be assumed to be zero. This will be very important later. We will assume independence in t-test and ANOVA, and parts of regression, but not all of regression! A final note on linear combinations We have been assuming “independence” for a while. We sample at random to obtain independence, and to get a good representative sample. But do “linear combinations” have anything to do with the simpler calculations we talked about earlier when we assumed independence, say a test of the mean against an hypothesized value? I'm glad you asked. As a matter of fact the mean is calculated as Y = ( Y1 + Y2 + Y3 + ... + Yn ) n and that is a linear combination. Fortunately for us we do not have to consider the covariances of individual observations because they are sampled independently! James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 99 Two-sample t-tests Recall the derived population of sample means. Imagine you had two of these derived populations, as we had for variances when we developed the F-test. Equality of Population Means Population 1 μ1 Mean Variance σ12 Population 2 μ2 2 σ2 Given two populations, test for equality. Actually test to see if the difference between two means is equal to some value, usually zero. H0: μ1 − μ2 = δ where often δ = 0 Derived populations From population 1 draw all possible samples of size n1 to get Y . 1 • Mean μ1 yields μ1 = μY1 • Variance • Derived population size is N1 1 2 σ12 yields σ12 = σ Y2 = σ1 n 1 1 n Likewise population 2 gives Y . 2 • Mean μ2 yields μ2 = μY2 • Variance • Derived population size is N 2 2 2 2 σ 2 yields σ 22 = σ Y2 = σ 2 n 2 2 n Draw a sample from each population Population 1 Population 2 Sample size n1 n2 d.f. γ1 γ2 Sample mean Y 1 Y2 Sample variance S12 2 S2 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) ( Page 100 ) Take all possible differences Y1 − Y2 = d . That is, all possible means from the first population ( n N1n1 means) and all possible means from the second population ( N 2 2 ), and get the difference n n between each pair. There are a total of ( N1 1 )( N 2 2 ) values. (Y −Y ) = d 1i 2j k n n n n for i = 1,..., N1 1 , j = 1,..., N 2 2 and k = 1,...,( N1 1 )( N 2 2 ) ( ) The population of differences Y i −Y2 j = dk has a mean equal to 1 equal to μd or μY1−Y2 , a variance 2 2 σ d or σ Y1 −Y and a standard error equal to σ d or σ Y1 −Y (the standard deviation of the mean difference). Characteristics of the derived population 1) As n1 and n2 increase, the distribution of d approaches a normal distribution. If the two original populations are normal, the distribution of d is normal regardless of n1 and n2. 2) The mean of the differences ( dk ) is equal to the difference between the means of the two parent populations. μd = μY1 − μY2 = μY1 −Y2 = δ 3) The variance of the differences ( dk ) is equal to the variance for the difference between the means of the two parent populations. 2 σ d = σ Y2 −Y = 1 2 σ d = σ Y −Y = 1 2 σ 12 n1 + σ 12 n1 2 σ2 n2 + 2 σ2 n2 This is a variance for a linear combination. The two samples are assumed independent, so covariance considerations are not needed. The variance comes from linear combinations. The variance of the sums is the sum of the variance (no covariance if independent). Linear combination: Y1 − Y2 Y The coefficients are 1, –1 (i.e. 1 1 + (−1)Y2 ) ˆ2 ˆ2 The variance for this linear combination is 12 σ Y1 + (−1)2 σ Y2 = ˆ ˆ2 σ 12 σ 2 + if the two variables are n1 n2 independent. Since they are each separately sampled at random this is an easy assumption. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 101 The two-sample t-test H 0 : μ1 − μ 2 = δ H 1: μ1 − μ 2 ≠ δ , a non-directional alternative (one tailed test) would specify a difference, either >δ or < δ. Commonly, δ is 0 (zero) If H0 is true, then E(d )=μ1 − μ2 σ = 2 d 2 σ12 σ 2 n1 If values of + n2 2 σ12 and σ 2 were KNOWN, we could use a Z-test, Z= d -δ σd = (Y1 − Y2 ) − ( μ1 − μ 2 ) . σ 12 n1 If values of + σ 22 n2 2 σ12 and σ 2 were NOT KNOWN, and had to be estimated from the samples, we would use a t-test, t = d - δ = (Y1 − Y2 ) − ( μ1 − μ 2 ) . 2 2 Sd S1 S 2 + n1 n2 Since the hypothesized difference is usually 0 (zero), the term ( μ1 − μ2 ) is usually zero, and the equation is often simplified to t = d , Sd Et voila, a two sample t-test! This is a very common test, and it is the basis for many calculations used in regression and analysis of variance (ANOVA). It will crop up repeatedly as we progress in the course. It is very important! t= d - δ (Y1 − Y2 ) − ( μ1 − μ 2 ) , often written just t = d when δ or = Sd Sd S12 S 22 + n1 n2 μ1 − μ2 is equal to zero. The two-sample t-test Unfortunately, this is not the end of the story. It turns out that there is some ambiguity about the S12 S 22 + degrees of freedom for the error variance, . Is it n1–1, or n2–1, or somewhere in n1 n2 between, or maybe the sum? Power considerations POWER! We want the greatest possible power. It turns out that we get the greatest power (and our problems with degrees of freedom go away) if we can combine the two variance James P. Geaghan Copyright 2010 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7005 taught by Professor Geaghan,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online