day11 - 4/13/11 PADP 8120: Data Analysis and Sta6s6cal...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 4/13/11 PADP 8120: Data Analysis and Sta6s6cal Modeling Logis&c Regression Spring 2011 Angela Fer6g, Ph.D. Plan Binary dependent variables Why can't we just use OLS? How does logis6c regression work? How do we compare logis6c models? 1 4/13/11 Example Say we want to the probability of a head being a single mother for a given level of educa6on. We could use OLS with a dependent variable equal to 0 if not a single mom and 1 if yes. We call this the Linear Probability Model. What is wrong with this? A probability takes a value between 0 and 1 and OLS may predict values outside that range. The distribu6on around the mean is not normally distributed for each value of X when the binary dependent variable (which is assumed in OLS). People with log faminc above 11 (=$60,000) have < 0 probability of being on food stamps... People with log faminc below about 7 (= $1100) have > 1 probability of being on food stamps... 2 4/13/11 A more realis6c of the rela6onship between the probability of receiving food stamps and income We can get this rela6onship with a logis6c transforma6on If the rela6onship between the probability of receiving food stamps () and income were linear, the equa6on is: The rela6onship in the graph is described by: Note: Logit()=log odds This is just the odds. As the probability increases (from zero to 1), the odds increase from 0 to infinity. The log of the odds then increases from infinity to +infinity. So if is `large' then as X increases the log of the odds will increase steeply. The steepness of the curve will therefore increase as gets bigger. 3 4/13/11 So that's the equa6on we want to es6mate, but how do we do it? OLS minimizes the sum of squared residuals to get the best fi^ng line. We can't do that here because the errors aren't normally distributed, so we use maximum likelihood to es6mate and . Maximum Likelihood Es6ma6on (MLE) involves an itera6ve process of "tweaking" the parameters to get the best fi^ng equa6on. Interpreta6on . logit fstamps lfaminc! Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -247.59864 -212.46911 -205.42677 -205.36096 -205.36093 -205.36093 ! ! ! ! ! ! Number of obs LR chi2(1) Prob > chi2 Pseudo R2 = = = = 645! 84.48! 0.0000! 0.1706! Logistic regression Log likelihood = -205.36093 ------------------------------------------------------------------------------! fstamps | Coef. Std. Err. z P>|z| [95% Conf. Interval]! -------------+----------------------------------------------------------------! lfaminc | -1.182139 .1452874 -8.14 0.000 -1.466897 -.897381! _cons | 9.653248 1.405242 6.87 0.000 6.899025 12.40747! ------------------------------------------------------------------------------! Increases in family income reduce the probability of receiving food stamps, but how do we interpret the coefficient -1.18? Saying that a 1 unit increase in log family income will decrease the log odds of receiving food stamps by 1.18 does not mean much to most. 4 4/13/11 One way to interpret the coefficient ^ log = a + bX ^ 1- ^ log = 9.65 - 1.18X ^ 1- ^ = e 9.65-1.18X ^ 1- ^ = e 9.65-1.18*7 ^ 1- ^ = 4.0 ^ 1- Plug in the coefficients. Take the an6log of both sides. Plug in a value for X. The odds of receiving food stamps (as opposed to not receiving them) are 4. For every 4 food stamp recipients there should be one non-recipient when log family income is 7 (=$1100). Another way to interpret the coefficient ^ log = a + bX ^ 1- ^ log = 9.65 - 1.18X ^ 1- ^ = e 9.65-1.18X ^ 1- ^ = e 9.65 (e -1.18 ) X ^ 1- e -1.18 = 0.31 By the rules of exponents. Thus, a 1 unit increase in X mul6ples the odds by eb. When X increases from 7 to 8 (from $1100 to $3000), the odds go to 4*.31=1.24. When X increases from 7 to 10 ($22000), the odds go to 4*(.31)3=0.12. 5 4/13/11 Interpreta6on #3: in terms of probabili6es Solve for : ^ ^ log = a + bX ^ 1- ^ = e a - bX ^ 1- ^ ^ = e a - bX (1- ) ^ ^ = e a - bX - e a - bX ^ ^ + e a - bX = e a - bX ^ (1 + e a - bX ) = e a - bX ^ = e 1 + e a - bX a - bX Plug in our coefficients and a value for X: e 9.65-1.18X ^ = 1 + e 9.65-1.18X e 9.65-1.18 7 ^= = 0.80 1 + e 9.65-1.18 7 The probability of a person with log family income=7 ($1100) of receiving food stamps is 80%. This is a graph of the predicted probabili6es on log family income. When lfaminc=7, the predicted probability=.80 (as we just calculated). 6 4/13/11 Interpreta6on if mul6ple independent variables . logit fstamps lfaminc tanf! Iteration 0: .! Iteration 6: log likelihood = -247.46078 log likelihood = -189.66078 ! ! Number of obs LR chi2(2) Prob > chi2 Pseudo R2 = = = = 644! 115.60! 0.0000! 0.2336! Logistic regression Log likelihood = -189.66078 ------------------------------------------------------------------------------! fstamps | Coef. Std. Err. z P>|z| [95% Conf. Interval]! -------------+----------------------------------------------------------------! lfaminc | -1.112006 .1510876 -7.36 0.000 -1.408132 -.8158793! tanf | 2.878823 .5407434 5.32 0.000 1.818986 3.938661! _cons | 8.777839 1.463384 6.00 0.000 5.909659 11.64602! ------------------------------------------------------------------------------! So receiving welfare (holding log family income constant) mul6plies the odds by e2.88, or 17.8 6mes. The odds of welfare recipients also receiving food stamps is almost 18 6mes the odds of non-welfare recipients receiving food stamps. The probability of a family with log income=7 that receives welfare receiving food stamps is 98%, and the probability of a family with log income=7 that does not receive welfare receiving food stamps is only 73%. Graphically Predicted probability of receiving food stamps if welfare recipient Predicted probability of receiving food stamps if not a welfare recipient 7 4/13/11 When more than 2 independent variables If you have lots of variables, set them to a par6cular value and then examine how the predicted probability of the dependent outcome varies. e.g. If I had more independent variables (age, sex, educa6on), I would produce a graph for women of average age with average educa6on not receiving welfare. Then I could see how log family income alone affected the predicted probability of receiving food stamps. Interpreta6on of interac6on effects . logit fstamps lfaminc age40 age40inc! Logistic regression Number of obs = 645! LR chi2(3) = 96.01! Prob > chi2 = 0.0000! Log likelihood = -199.59445 Pseudo R2 = 0.1939! ------------------------------------------------------------------------------! fstamps | Coef. Std. Err. z P>|z| [95% Conf. Interval]! -------------+----------------------------------------------------------------! lfaminc | -1.012106 .1594497 -6.35 0.000 -1.324621 -.6995899! age40 | 5.861095 3.754634 1.56 0.119 -1.497853 13.22004! age40inc | -.697617 .3915596 -1.78 0.075 -1.46506 .0698258! _cons | 8.253396 1.546835 5.34 0.000 5.221655 11.28514! ------------------------------------------------------------------------------! Predicted probability of receiving food stamps if over age 40 Predicted probability of receiving food stamps if under age 40 8 4/13/11 Comparing Models In OLS, we are minimizing the sum of the squared errors, so we compare models by comparing how R2 (which gets bigger when the SSEs are smaller) changed across models. In logis6c regression, we are maximizing a likelihood func6on (MLE), so we can compare models by comparing the maximum of the likelihood func6on for each model. We actually do a likelihood-raHo (LR) test where: First model's maximized value Second model's maximized value We log the Ls and mul6ply them by -2 so that this sta6s6c has a chi-square distribu6on. This means we test whether there is a sta6s6cally significant improvement with reference to the 2 distribu6on. Example Model logit fstamps lfaminc logit fstamps lfaminc tanf logit fstamps lfaminc tanf black -2logL 84.48 115.60 175.79 Diff=31.12 Diff=60.19 In both model changes, we are adding one variable so that is a change of 1 degree of freedom. As the table below shows, chi-square sta6s6cs of 31 and 60 with 1 df are highly significant, so the model clearly fits beqer with the extra terms. Chi-square Table df\area 1 0.1 2.71 0.05 3.84 0.025 5.02 0.01 6.63 9 ...
View Full Document

This note was uploaded on 01/18/2012 for the course PADP 8120 taught by Professor Fertig during the Summer '11 term at University of Georgia Athens.

Ask a homework question - tutors are online