class_10_17 - Statistical Data Mining ORIE 474 Fall 2007...

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Statistical Data Mining ORIE 474 Fall 2007 Tatiyana Apanasovich 10/17/07 Logistic Regression
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Why use logistic regression? There are many important research topics for which the dependent variable is "limited." Binary logistic regression is a type of regression analysis where the dependent variable is a dummy variable: coded 0 (did not vote) or 1(did vote)
Background image of page 2
The Linear Probability Model In the OLS regression: Y = a + b*X + e ; where Y = (0, 1) The error terms are heteroskedastic e is not normally distributed because Y takes on only two values The predicted probabilities can be greater than 1 or less than 0
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Q: EVAC Did you evacuate your home to go someplace safer before Hurricane Dennis (Floyd) hit? 1 YES 2 NO An Example: Hurricane Evacuations
Background image of page 4
The Data EVAC PETS MOBLHOME TENURE EDUC 0 1 0 16 16 0 1 0 26 12 0 1 1 11 13 1 1 1 1 10 1 0 0 5 12 0 0 0 34 12 0 0 0 3 14 0 1 0 3 16 0 1 0 10 12 0 0 0 2 18 0 0 0 2 12 0 1 0 25 16 1 1 1 20 12
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
LS Results Dependent Variable: EVAC Variable B t-value (Constant) 0.190 2.121 PETS -0.137 -5.296 MOBLHOME 0.337 8.963 TENURE -0.003 -2.973 EDUC 0.003 0.424 FLOYD 0.198 8.147 R 2 0.145 F-stat 36.010
Background image of page 6
Problems: Descriptive Statistics 1070 -.08498 .76027 .2429907 .1632 1070 Unstandardized Predicted Value Valid N (listwise) N Minimum Maximum Mean Std. Deviat Predicted Values outside the 0,1 range
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Heteroskedasticity TENURE 100 80 60 40 20 0 U n s t a n d a r d i z e d R e s i d u a l 10 0 -10 -20
Background image of page 8
The Logistic Regression Model The "logit" model solves these problems: ln[p/(1-p)] = α + β X + e p is the probability that the event Y occurs, p(Y=1) p/(1-p) is the "odds ratio" ln[p/(1-p)] is the log odds ratio, or "logit"
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
More: The logistic distribution constrains the estimated probabilities to lie between 0 and 1. The estimated probability is: p = 1/[1 + exp(- α - β X)] if you let + X =0, then p = .50 as + X gets really big, p approaches 1 as + X gets really small, p approaches 0
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Maximum Likelihood Estimation (MLE) MLE is a statistical method for estimating the coefficients of a model. The likelihood function (L) measures the probability of
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 34

class_10_17 - Statistical Data Mining ORIE 474 Fall 2007...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online