{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Logistic Regression, Prediction and ROC

# Logistic Regression, Prediction and ROC - Log istic Reg r...

This preview shows pages 1–4. Sign up to view the full content.

2/17/2014 Logistic Regression, Prediction and ROC https://blackboard.uc.edu/bbcswebdav/pid-9566224-dt-content-rid-55868231_2/courses/14SS_BANA7046002/notes%284%29.html 1/15 Logistic Regression, Prediction and ROC The objective of this case is to get you understand logistic regression (binary classification) and some important ideas such as cross validation, ROC curve, cut-off probability. Code in this case is built upon lecture slides and Shaonan Tian's sample code. Input and sample data First load the credit scoring data. It is easy to load comma-separated values (CSV). credit.data <- read.csv("http://homepages.uc.edu/~maifg/7040/credit0.csv", header = T) Now split the data 90/10 as training/testing datasets: subset <- sample(nrow(credit.data), nrow(credit.data) * 0.9) credit.train = credit.data[subset, ] credit.test = credit.data[-subset, ] The training dataset has 63 variables, 4500 obs. colnames(credit.train)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2/17/2014 Logistic Regression, Prediction and ROC https://blackboard.uc.edu/bbcswebdav/pid-9566224-dt-content-rid-55868231_2/courses/14SS_BANA7046002/notes%284%29.html 2/15 ## [1] "id" "Y" "X2" "X3" "X4" "X5" "X6" ## [8] "X7" "X8" "X9" "X10_2" "X11_2" "X12_2" "X13_2" ## [15] "X14_2" "X15_2" "X15_3" "X15_4" "X15_5" "X15_6" "X16_2" ## [22] "X16_3" "X16_4" "X16_5" "X16_6" "X17_2" "X17_3" "X17_4" ## [29] "X17_5" "X17_6" "X18_2" "X18_3" "X18_4" "X18_5" "X18_6" ## [36] "X18_7" "X19_2" "X19_3" "X19_4" "X19_5" "X19_6" "X19_7" ## [43] "X19_8" "X19_9" "X19_10" "X20_2" "X20_3" "X20_4" "X21_2" ## [50] "X21_3" "X22_2" "X22_3" "X22_4" "X22_5" "X22_6" "X22_7" ## [57] "X22_8" "X22_9" "X22_10" "X22_11" "X23_2" "X23_3" "X24_2" Logistic Regression Let's build a logistic regression model based on all X variables. Note id is excluded from the model. credit.glm0 <- glm(Y ~ . - id, family = binomial, credit.train) You can view the result of the estimation: summary(credit.glm0) Note that there might be a warning message “ Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred ”. This happens because of a problem called quasi-complete separation. You can learn more here or here . The offending variable is X9 . We will choose to ignore the warning now. You can try fitting your model without X9 . The usual stepwise variable selection still works for logistic regression. caution: this will take a very long time . credit.glm.step <- step(credit.glm0) Or you can try model selection with BIC: credit.glm.step <- step(credit.glm0, k = log(nrow(credit.train)))
2/17/2014 Logistic Regression, Prediction and ROC https://blackboard.uc.edu/bbcswebdav/pid-9566224-dt-content-rid-55868231_2/courses/14SS_BANA7046002/notes%284%29.html 3/15 Prediction and Cross Validation Using Logistic Regression Now suppose there are 2 models we want to test, one with all X variables(credit.glm0), and one with X3, X8 and X11_2(credit.glm1). credit.glm1 <- glm(Y ~ X3 + X8 + X11_2, family = binomial, credit.train) AIC(credit.glm0) ## [1] 1713 AIC(credit.glm1) ## [1] 1891 BIC(credit.glm0) ## [1] 2110 BIC(credit.glm1) ## [1] 1916 Understanding classification decision making using logistic regression To get prediction from a logistic regression model, there are several steps you need to understand. Refer to textbook/slides for detailed math.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 15

Logistic Regression, Prediction and ROC - Log istic Reg r...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online