# We estimate probability of particular outcome for

• 27
• 100% (3) 3 out of 3 people found this document helpful

This preview shows page 17 - 20 out of 27 pages.

We estimate probability of particular outcome for each case/observation Limitation of logistic regression: y must be discrete or binary Our interest usually is in the probability of success Maximum Likelihood Estimation Goal: find best combination of predictor ( x ) variables to maximize likelihood of obtaining observed outcome frequencies o For every observation in a data set, y is 0 or 1, while each x will have a different measure o Given observed frequencies of 1's and 0's, what set of predictor variables is most likely to return the observed set of outcomes? Odds Ratios . tabulate hyp preterm preterm hypertens 0 1 Total 0 375 44 419 1 52 19 71 Total 427 63 490 Calculating odds of baby being born preterm to a hypertensive mother: o First set of odds (hypertensive mother): probability of being born preterm, divided by probability of not being born preterm: 19/52 o Second set of odds (nonhypertensive mother): 44/375 Odds ratio: ratio of those two fractions o 19/52 ÷ 44/375 = 3.11
Week 4: Impact Evaluation: Identifying and Interpreting Program Effects Thus odds of a baby being born preterm to hypertensive mother are 3.11 times greater than those for nonhypertensive mother Odds: Background Related to probability o "5 to 1 odds" = 20% probability or "20% chance" However, probability is nonlinear o .10 → .20 = 2 x probability; .80 → .90 = small increase in probability Odds ratio: probability of winning, divided by probability of losing o "5 to 1 odds" = odds ratio of .20/.80 = .25 (25%) Logit function: natural log of odds ratio ("log odds") o Scale of logit function is linear and functions much like a z -score scale Why Logistic Regression? Linear regression: Y | X = β 0 + β 1 X 1 + ε Assumptions: o Errors are normally distributed o Mean = 0 o Variance is constant Can't use linear regression with logit functions (when outcome variable is discrete); data doesn't meet the condition of homoscedasticity (homogeneity of variance) Logistic Regression in Stata: One Predictor . logistic preterm hyp Logistic regression Number of obs = 490 LR chi2(1) = 11.97 Prob > chi2= 0.0005 Log likelihood = - 182.00757 Pseudo R2= 0.0318 preterm Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] hyp 3.114073 .9711379 3.64 0.000 1.689965 5.738256
Week 4: Impact Evaluation: Identifying and Interpreting Program Effects The chi 2 statistic (like F in a linear regression) helps us determine whether or not a model is significant Odds ratio's 95% confidence interval should not cross 1; 1:1 ratio indicates no difference in probability Logistic Regression in Stata: Two Predictors Adding predictor variable: birth weight . logistic preterm hyp bweight Logistic regression Number of obs = 490 LR chi2(2) = 158.78 Prob > chi2= 0.0000 Log likelihood = - 108.60651 Pseudo R2= 0.4223 preterm Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] hyp 1.239188 .556094 0.48 0.633 .5142267 2.986205 bweight .9967776 .0003802 -8.46 0.000 .9960327 .9975232 Preterm babies tend to be smaller o For every 1-gram increase in birth weight, odds of being preterm decrease (multiplied by .99) o Per gram of weight, decrease in odds of being preterm: 1 - 0.99 (coefficient) × 100% = 1% Compare the differences in two babies who differ by 30 grams: ( 1 - 0.99 30 ) × 100 = ( 1 - 0.74 ) × 100 = 26 % 30-gram increase in birth weight lowers odds of being born preterm by 26% Reporting Logistic Regression Discuss odds ratio, 95% confidence interval o More meaningful than p value of z Summary Logistic regression helps us: o Estimate probability of success with outcome variable y
• • • 