This preview shows page 1. Sign up to view the full content.
Unformatted text preview: 2/2/12 PADP 8130: Linear Models Mul$ple Regression Angela Fer9g, Ph.D. Plan for today • Show the following using matrix algebra: – Deriva9on of the OLS es9mator – Assump9ons of the OLS model – Proper9es of the OLS es9mator • Explain how to interpret mul9ple regression results 1 2/2/12 Recall Without matrix algebra, would write: y = α + β1 xi1 + β 2 xi 2 + β3 xi 3 + + β j xij + ε i for each observation i.
i
With matrix algebra, would write: y1 ⎤ ⎡
⎥⎢
y2 ⎥ ⎢
=⎢
⎥⎢
⎥
yn ⎥ ⎢
⎦⎣
[ n x 1] = [ n
y = Xb + e
⎡
⎢
⎢
⎢
⎢
⎢
⎣ 1 x11 x1 j ⎤ ⎡ α
⎥⎢
1 x21
x2 j ⎥ ⎢ β1
⎥ ⎥⎢ ⎢
1 xn1 xnj ⎥ ⎢ β j
⎦⎣
x (j + 1)][( j + 1) x 1] + [ n ⎤⎡
⎥⎢
⎥+⎢
⎥⎢
⎥⎢
⎥⎢
⎦⎣ ε1 ⎤
⎥
ε2 ⎥
⎥
⎥
εn ⎥
⎦ x 1] Deriva9on of OLS es9mator e = y  Xb
b = arg min e'e
e'e = (y  Xb)'(y  Xb) Each term is a 1X1. = (y'y  y'Xb  b'X'y + b'X'Xb)
= (y'y  2b'X'y + b'X'X'b)
∂e'e
= 2X'y + 2X'Xb = 0
∂b
b = (X'X)1 X'y 2 2/2/12 The 5 assump9ons of OLS y = Xb + e
1. Linear func9onal form E (e ) = 0
2. Zero mean of error 3. Disturbance terms have same variance (homoskedas9city) and are not correlated with one another (non autocorrela9on) ⎛ σ2
0⎞
⎜
⎟ E (ee') = σ 2 I = ⎜ ⎟
⎜0
σ2 ⎟ ⎝
⎠ The 5 assump9ons of OLS, cont. 4. Uncorrelatedness of regressor and disturbance (exogeneity) Cov( xi , e j ) = 0 5. No exact linear rela9onships between regressors (no mul9collinearity, or iden9ﬁca9on condi9on) X is full rank: X is an n x k matrix with rank k 3 2/2/12 Proper9es of the OLS es9mator Unbiased: b = (X'X)1 X'Y
= (X'X)1 X'(Xβ + ε )
= (X'X)1 X'Xβ + (X'X)1 X'ε
= β + (X'X)1 X'ε
Eb = β Proper9es of the OLS es9mator Variance: Var (b) = Var (β + (X'X)1 X'ε )
= (X'X)1 X'(Varε )X(X'X)1
= σ 2 (X'X)1 X'X(X'X)1
Var (b) = σ 2 (X'X)1 4 2/2/12 Proper9es of the OLS es9mator If we assume that the disturbances are distributed normally: ε ~ N (0, σ 2 I) Then, the OLS es9mator is also distributed normally: 2
1 b ~ N ( β , σ (X'X) ) Now, the purpose and interpreta9on of mul9ple regression 5 2/2/12 Observa9onal data and Mul9ple Regression • In social science, it is o\en impossible to experiment on people (can’t make one group poor, homeless, uninsured, etc., and see what eﬀect it has on them). • Instead, we rely on observa)onal data and control for alterna9ve explana9ons using mul)ple regression. – Mul9ple regression allows us to include numerous independent variables so we can include those variables that we think might be producing spurious rela9onships Example for the day • What predicts a_tudes about abor9on? • Hypothesis: older people are less pro choice than younger people because younger people were raised in a more socially liberal environment than their elders • Sample: 100 Bri9sh people • Measure of a_tudes: 10 point
scale “Please tell me whether you think abor9on can be jus9ﬁed, never be jus9ﬁed or something in between using this card” [R. given a 1 10 response card, where 1 is always jus9ﬁed and 10 is never jus9ﬁed]. 6 2/2/12 Scaier plot 10 Abortion attitude (10=anti) 9
8
7
6
5 Linear regression
line 4
3
2
1
0
10 20 30 40 50 60 70 80 90 Age Bivariate OLS Variable Coefficient
value Age 0.10 Intercept 0.46 The equa9on for our linear regression is: y = 0.46 + 0.10X + e where y is a_tude towards abor9on, X is age, and e is the error term. à༎ So, there seems to be a rela9onship between a_tudes and age. à༎ The rela9onship also appears to be large in magnitude. If James is 10 years older than Jessie, then we predict that James will be more pro life and will score about 1 point higher on our 10 point scale. 7 2/2/12 Alterna9ve explana9on • Could it be that the rela9onship is spurious with religiosity being associated with both age and abor9on a_tudes? • The data says: – People that go to church 4+ 9mes/month • Have a mean of 6.95 on our abor9on a_tude scale • Have a mean age of 58 – People that go to church <1 9me/month • Have a mean of 2.48 on our abor9on a_tude scale • Have a mean age of 26 Another scaier plot Religious people
(who are old and
prolife). 10 Abortion attitude (10=anti) 9
8
7
6
5 Linear regression
line 4
3 Irreligious people
(who are young and
prochoice). 2
1
0
10 20 30 40 50 60 70 80 90 Age 8 2/2/12 Mul9variate OLS We need to include religiosity and age as independent variables in our regression. Variable Coefficient
value Age
Intercept 0.10
0.46 Variable Coefficient
value Age (b1) 0.03 Religiosity (b2) 0.84 Intercept (a) 2.07 The equa9on for our mul9ple regression is: y = 2.07 + 0.03X1 + 0.84X2 + e Which means that as people go to church an extra 9me per month, their abor9on a_tude score goes up by 0.84 points, holding age constant. Likewise, as people age one year, their abor9on a_tude score goes up by 0.03 points, holding church a4endance constant. Thinking about extra predictors • The best way of thinking about regressions with more than one independent variable is to imagine a separate regression line for age at each value of religiosity, and vice versa. • The eﬀect of age is the slope of parallel lines, controlling for the eﬀect of religiosity. 9 2/2/12 Graphically Regression line when X2=3 10 Regression line when X2=4 Abortion attitude (10=anti) 9
8
7
6
5
4 Regression line when X2=1 3
2 Regression line when X2=2 1
0
10 20 30 40 50 60 70 80 90 Age Mul9ple regression summary • Our example only has 2 predictors, but we can have any number of independent variables. Y = α + β1 X 1 + β2 X 2 + β3 X 3 + ... + ε i
• Thus, mul9ple regression is a really useful extension of simple linear regression. – Mul9ple regression is a way of reducing spurious rela)onships between variables by including the real cause. – Mul9ple regression is also a way of tes9ng whether a rela)onship is actually working through another variable (as it appears to be in our example). 10 ...
View
Full
Document
This note was uploaded on 03/28/2012 for the course PADP 8130 taught by Professor Fertig during the Spring '12 term at LSU.
 Spring '12
 Fertig

Click to edit the document details