This preview shows page 1. Sign up to view the full content.
Unformatted text preview: 2/11/12 PADP 8130: Linear Models Hypothesis Tes,ng Angela Fer9g, Ph.D. • Up un9l now, we’ve been geDng es9mates for parameters. – We’re convinced that OLS is a good way of geDng an es9mate. – We think mul9ple regression is a great way to control for confounding factors. • Now, we want to be able to say how conﬁdent we are that our speciﬁc OLS es9mates are the same or diﬀerent from hypothesized parameter values (mostly 0). 1 2/11/12 Plan • Standard error of the OLS es9mator • Conﬁdence Intervals • Hypothesis Tes9ng Procedure – T test – F test – Chi square test (Wald) Measures of dispersion of the popula9on distribu9on • Variance ( 1n
σ = ∑ Xi − X
n i=1
2 • Standard Devia.on (square root of variance) ) 2 __
⎛ X − X⎞
∑⎝ i ⎠
i =1
n σ= 2 n 2 2/11/12 Popula9on vs. Sampling Distribu9on •
•
•
• If we took lots of samples of a popula9on, and calculated some sta9s9c (mean or OLS es9mate) from each sample, we would have a distribu9on of sample sta9s9cs, or the sampling distribu.on. Due to averaging the sample sta9s9c does not vary as widely as the individual observa9ons. Moreover, if we took lots of samples, then the distribu9on of the sample means would be centered around the popula9on mean (unbiased). As the sample size (number of samples) increases, the sampling distribu9on looks more and more like a normal distribu9on (central limit theorem). Population distribution Mean of population
Mean of all sample means
Sampling
distribution Dispersion of the sampling distribu9on • Sampling distribu9ons that are 9ghtly clustered will give us a more accurate es9mate on average than those that are more dispersed. • We need to es9mate the dispersion of our sampling distribu9on so that we know how good our sta9s9c is. à༎ The standard error is the standard devia9on of the sampling distribu9on. 3 2/11/12 Standard error of b • If we took lots of separate samples and then calculated lots of separate regression lines, we would get a distribu9on of slope coeﬃcients b. • The sampling distribu9on of b is normal if the sample size is large, and the mean of all the possible b’s is β. • The formula for the standard error of b is: s Standard error of b = ∑ (X − X )
∑ (Y − Yˆ ) = ∑ e =
2 2 where s = i n−k = s (X'X)1 2
i n−k e'e
n−k ˆ
and Y = Xb. Square root of variance Don’t know parameter σ, so must es9mate s instead. Interpreta9on of SE The standard error of a point es9mate gives you the varia9on in the sampling distribu9on around the point so that you can: – Give a conﬁdence interval – Conduct tests of hypotheses 4 2/11/12 Conﬁdence Interval • A conﬁdence interval for an es9mate is a range of numbers within which the parameter is likely to fall • We can use the standard error to produce such a range: estimate ± ( z × standard error)
• z is the conﬁdence coeﬃcient and is chosen to determine what is meant by “likely” to contain the actual value of the es9mate (usually close to 1, like 0.95 or 0.99) – Since the sampling distribu9on is normal, we know the values of z that correspond to the probability of any propor9on Normal distribu9ons – Have a bell shape – Are symmetrical – Follow the empirical rule: The probability of falling within z standard devia9ons of the mean is: Confidence z 68%
95%
99%
99.9% 1.00
1.96
2.58
3.29 5 2/11/12 Example With 95% conﬁdence: β = b ± 1.96 × SEb Example: If b=0.5 and SE of b=0.2, then β = 0.5 ± 1.96 * 0.20
= 0.5 ± 0.39 So, with 95% conﬁdence, the slope of our line (the parameter) will lie between 0.11 and 0.89. Graph of conﬁdence interval True beta = 6 beta Estimated b=8.5 95% conﬁdence interval (z=1.96) 6 2/11/12 More samples That’s just one sample. Let’s imagine that we took many samples. Then, we calculated 95% conﬁdence intervals for all of the sample means. Graph of conﬁdence interval True beta= 6 beta 7 2/11/12 Interpreta9on • Of the 7 samples, all of the conﬁdence intervals around the es9mated coeﬃcient included the actual true beta except for one. • If we took more samples, we would expect that 95% of the conﬁdence intervals to include the actual beta. – 95% because that’s the conﬁdence coeﬃcient we picked. Exact conﬁdence coeﬃcient for small sample sizes • Because we don’t know the popula9on standard devia9on and must use the sample standard devia9on to get the es9mated standard error, there is error, especially when the sample size is small. • To account for this error, for small n, we should use the t distribu.on, not the normal distribu9on, to es9mate the conﬁdence interval. The t distribu9on has faker tails than the normal distribu9on. • There are tables that give these scores for diﬀerent conﬁdence levels and diﬀerent degrees of freedom (df=n 1). The t distribu9on looks almost exactly like the normal distribu9on for large df. 8 2/11/12 t distribu9on graph t distribu9on table Confidence t(df=1) t(df=10) t(df=30) t(df=100) z 90%
95%
99% 1.81
2.23
3.17 1.70
2.04
2.75 1.66
1.98
2.63 6.31
12.71
63.66 1.65
1.96
2.58 9 2/11/12 Controlling the conﬁdence interval • Choose a diﬀerent conﬁdence level. – If we picked 99% conﬁdence instead, the interval would be larger. – If we picked 90% conﬁdence, the interval would be narrower, but we would be wrong more olen. • Change the sample size. – The bigger the sample size, the lower the standard error and therefore the smaller the conﬁdence interval for a given probability. Now Hypothesis Tes9ng 10 2/11/12 What is a hypothesis? • A hypothesis is a testable statement about the world, usually a predic9on that some parameter takes a par9cular numerical value. – We test hypotheses by akemp9ng to see if they could be false, rather than “proving” them to be true. – E.g. You cannot prove that all swans are white by coun9ng white swans, but you can prove that not all swans are white by coun9ng one black swan. • We generate hypotheses from a combina9on of theory, past empirical work, qualita9ve research, common sense, and anecdotal observa9ons about the world. Null and alterna9ve hypotheses • When we’re tes9ng hypotheses, we want to choose between 2 conﬂic9ng statements: – Null hypothesis (H0) is directly tested. • This is a statement that the parameter we are interested in has a value similar to no eﬀect; e.g. rich and poor people are equally likely to have a regular place for medical care – Alterna9ve hypothesis (Ha) contradicts the null hypothesis. • This is a statement that the parameter falls into a diﬀerent set of values than those predicted by H0. e.g. rich and poor people have diﬀerent probabili9es of having a regular place for care 11 2/11/12 Two Sided vs. One Sided Tests • One sided test: H0: Paler=16; Ha: Paler<16 • Two sided test: H0: Paler=16; Ha: Paler≠16 • Two sided tests are the conven9on because: – Makes it even more diﬃcult to ﬁnd results due to chance – We normally don’t have strong prior informa9on about the diﬀerence – Two tailed tests appear more objec9ve (not inﬂuenced by your beliefs about the direc9on) T score • We olen use the t score instead of the z score as our test sta9s9c because by using s to es9mate σ in the standard error introduces addi9onal error. • This is why signiﬁcance tests are olen called t tests. • This is especially important if n<30 or 40. To use this, we need to assume that the popula9on distribu9on is normal. b − β
t=
seb or more generally, if H0: Rb=q t= Rb  q
s 2 R(X'X)1 R' 12 2/11/12 Interpre9ng hypothesis tests • We never accept the null hypothesis. • We either reject or fail to reject based on our p value: – Make judgment that p values of, say, 5% and below are probably good evidence that the null hypothesis can be rejected. • May fail to reject null hypothesis because the null hypothesis is true or because of: – Small sample size – Inappropriate research design – Biased sample – Etc. Example • We usually want to test whether β=0 in the popula9on. • So, we calculate
the t sta9s9c – how many SEs from zero is the b? Then get the p value. Estimate − Null hypothesis 0.5 − 0
t=
=
= 2.5
Standard error
0.2 Rule of thumb: t stat>2 is signiﬁcantly diﬀerent from 0. Pr estimate being more than 2.5 SEs higher than the null) = 0.007
( – Thus, for a 2 sided test, there is only a 1.4% chance that we would get this es9mate if the null hypothesis is true. – So, we can reject the null hypothesis that β=0. That is, the eﬀect is signiﬁcantly diﬀerent from zero. 13 2/11/12 Steps for Hypothesis Test 1. Check assump9ons (i.e. normality, sample size) 2. State hypotheses – null and alterna9ve, one sided or two sided 3. Calculate appropriate test sta.s.c (summary of how far es9mate falls from the parameter value in H0 e.g. t score) 4. Calculate associated p value (probability that es9mate is equal to parameter value in H0) 5. Interpret the result Type I and Type II errors • A Type I error occurs when we reject H0, even though it is true. – This is going to happen 5% of the 9me if we choose to reject H0 when the p value is less than 5%. • A Type II error occurs when we do not reject H0, even though it is false. – Some9mes there is a real diﬀerence, but we don’t detect it. 14 2/11/12 Trade oﬀs • There is a trade oﬀ between the two types of error. The more stringent the signiﬁcance level: – The more diﬃcult to detect a real eﬀect (more Type II error), – But the more conﬁdent we can be that when we ﬁnd an eﬀect it is real (less type I error). • Depending on what we are doing, we may be more willing to accept one sort or the other. – Analogous to a legal trial. We don’t want the guilty to go free (a type II error), but we’d be even unhappier if we execute an innocent person (a type I error). F test • Purpose: An F test is used to test joint hypothesis. A joint hypothesis tests hypotheses on more than one coeﬃcient at the same 9me (e.g. β1=0 & β2=2, or β2 = β3) • How the F test works: Instead of comparing b to β as in a t test, the F test approach es9mates – An unconstrained model and – A constrained model with the hypotheses imposed – And then compares the sum of the squared errors from the 2 models – The errors will be smaller in the unconstrained model, but if the diﬀerence is large, then the constraints are unlikely to be true, and we reject the null hypothesis that the constraints are true. 15 2/11/12 F sta9s9c (expressed 2 ways) F= [ e C'e C  e UC'e UC ] / J
e UC'e UC / ( N − K ) or where F = ( Rb − q ) '(s 2 R( X ' X )−1 R ')−1 ( Rb − q ) / J
~ F ( J, N − K ) eC is the error term in the constrained model
eUC is the error term in the unconstrained model
J is the number of linear restrictions
N is the number of observations
K is the number of parameters (including the intercept) in the unconstrained model
H 0 : Rb = q How to use the F sta9s9c • F is distributed as an F random variable with (J, N K) degrees of freedom. • We will reject the null if F is suﬃciently large, where large is determined by the chosen signiﬁcance level (using an F table). • If the null was that a set of coeﬃcients are 0, and it is rejected, then we say that x1, x3, x4, and x7 (whichever were part of the hypothesis) are jointly sta.s.cally signiﬁcant. 16 2/11/12 Example H0: The slopes of the race dummy variables will be 0. Ha: The slopes will not be 0; race makers. Unconstrained model: y=Za+Xb+e where Z includes race variables. Constrained model: y=Xb+e. e’UCeUC = 5473.198 e’CeC = 5573.575 J=2 N=8187 K=9 F2,8178 = (5573.575 − 5473.198) / 2 50.1885
=
= 74.99
5473.198 / (8187 − 9 )
.6693 Basically 0 probability that the null is true. We reject the null; we cannot reject the hypothesis that race makers. Chi square tests • Purpose: Chi square tests are also used to test joint hypotheses, but can also test hypotheses involving non linear restric.ons. – Example of non linear restric9on: H0: βfemale/βprivate=1 • Three main types of chi square tests: – Wald test – we will focus on this test – Likelihood Ra9o – Lagrange Mul9plier (or score test) 17 2/11/12 Wald Test How the Wald test works: The Wald test es9mates – An unconstrained model (but not a constrained model) – The hypothesis is H0: g(b)=0, where g(b) can be a non linear set of restric9ons; in the linear case, g(b)=Rb q, where • b includes the es9mated coeﬃcients • R is matrix indica9ng the coeﬃcients in the hypotheses • q is a vector of their hypothesized values. – The Wald tests whether g(b) is close to 0; if g(b) is very diﬀerent from 0, then the constraints are unlikely to be true, and we reject the null hypothesis that the constraints are true. Wald sta9s9c (expressed 2 ways) W= N [ e C'e C  e UC'e UC ]
e UC'e UC or W = g(b)'(s 2 ∂g(b)
∂g(b)' 1
(X'X)1
) g(b) ~ χ 2 ( J )
∂b
∂b where
eC is the error term in the constrained model eUC is the error term in the unconstrained model
J is the number of linear restrictions
N is the number of observations
K is the number of parameters (including the intercept) in the unconstrained model
H 0 : g(b) = 0 (general case) or Rb  q=0 (linear case) 18 2/11/12 Wald and F Wald and F sta9s9cs are very similar: F= [ e C'e C  e UC'e UC ] / J
e UC'e UC / ( N − K ) W= N [ e C'e C  e UC'e UC ]
e UC'e UC F→ W
~ F ( J, N ) as N → ∞
J Note: Stata uses F sta9s9c aler regress even if restric9on is non linear; Stata will run Wald if the model is non linear (glm or logit). 19 ...
View
Full
Document
This note was uploaded on 03/28/2012 for the course PADP 8130 taught by Professor Fertig during the Spring '12 term at LSU.
 Spring '12
 Fertig

Click to edit the document details