Application of Decision Trees in Credit Score Analysis
A project submitted
In partial fulfillment for the
Requirements for the Degree of
Master of Science in Business Analytics
in the department of
Operations, Business Analytics and Information Systems
of

3. IRIS DATA
# Random sample a data set that contains 90% of original data points:
iris_sample_90 = iris[sample(x = nrow(iris), size = nrow(iris) * 0.9), ]
# 1. Summary of the data:
summary(iris)
Min
1st Qu
Median
Mean
3rd Qu
Max
Std. Dev.
Sepal.Length
4.

Forecasting Default: A Comparison between Merton Model
and Logistic Model
A project submitted in partial fulfillment for the requirements for the master
degree of science in business administration with a concentration in quantitative
analysis
By
Man Xu
(

Application of Decision Trees in
Credit Score Analysis
1
Agenda
Introduction
Overview
Literature Review
Case Study
Introduction
Model Analysis
Model Validation
Conclusion and Discussion
Model Comparison
Conclusion
Questions
2
Introduction
Cred

A Study of Unsupervised Learning
A Project Submitted in Partial Fulfillment for the Requirements for
the Degree of Masters of Science in Quantitative Analysis
2006
Andrew R. Remington
Bachelor of Arts in Mathematics, University of Cincinnati, 2003
Committ

Successful Data Mining
in Practice:
Where do we Start?
Richard D. De Veaux
Department of
Mathematics and Statistics
Williams College
Williamstown MA, 01267
[email protected]
http:/www.williams.edu/Mathematics/rdeveaux
JSM 2-day Course SF August 2-3, 20

22-BANA 7046 Data Mining I (2 cr.)
22-BANA 7047 Data Mining II (2 cr.)
Section 002 Day Class
Spring Semester, 2013-2014
Instructor:
Professor Yan YU
Office:
Office Hours:
Class Time:
Email:
Phone:
Web Page:
527 Lindner Hall
M, (1st 7 weeks) 4pm-5pm and by

BANA 7046 Prof. Yan Yu
Homework #4
Guidelines of Homework Reports:
Please include one-page executive summary for each problem at the beginning of your
HW report, clearly stating:
Goal and Background - What is the problem?
Approach - What have you done?

Tree Models
In this lab we will go through the model building, validation, and interpretation of tree models. The focus will
be on rpart package.
Regression Tree vs. Classification Tree
CART stands for classification and regression tree.
Regression tree:

Dr. Yan Yu
Linder 527
(513)556-7147
[email protected]
http:/business.uc.edu/departments/obais/faculty/yan-yu.html
Office Hours:
TH 2:15pm-3:15pm and by appointment
Lindner 527
Professor, OBAIS
UC 2000-Present
Ph.D. Cornell University 2000
M.S.
Texas A&M

Logistic Regression
Credit Scoring Case Study
Credit risk need to be quantified
Credit scoring used to grant credit
Credit market grows
1
Consumer credit market
Consumer Credit Outstanding
2500
2000
Billion $
1550.2
1726.5
1865.2
1942.6
2025.5
2005.3
2018

Introduction to Data Mining
by
Tan, Steinbach, Kumar
Chapter 2: Data
1
What is Data?
An attribute is a property or
characteristic of an object
Examples: eye color of a
person, temperature, etc.
Objects
Attributes
Tid Refund Marital
Status
Taxable
Income C

Lecture:
Classification and Regression Trees
Tree (Type)
Regression: response variable Y is numeric
Classification: response variable Y is category
2
Motivating Example: Beer Preference
Hacker Pschorr
One of the oldest beer brewing
companies in Munich

Data Mining: Exploring Data
Adapted from
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
8/05/2005
#
What is data exploration?
A preliminary exploration of the data to
better understand its characteri

Lecture:
Logistic Regression
Regression for Classification
We want to model p = P(Y=1) which should be between 0 and 1
But a regression line can go from to +
2
Step 1: Probabilities Odds
If an event has probability
p then the odds of the
event are:
p/

Generalized Linear Models (GLM)
g ( E (Y | X ) = g ( ) = X
Response variable from non-continuous distribution, e.g.
binomial or Poisson
Gaussian additive assumption is severely violated
GLM Reference:
Hosmer, D. and Lemeshow, S. (2000), Applied Logistic
R

Linear Regression
Quick Overview
and a Case Study
1
Introduction
y : Dependent variable
x : Independent variable
2
Multiple Regression Model
yi = Value of the Dependent Variable in ith case
xi1 , , xi , p = Values of the p Independent Variables in ith cas

BANA 7046 Data Mining I
(Section 002 - Day)
Assignment 4
Group #3
For the correlation of the continuous variables with each other, we present the correlation
matrix and scatterplot matrix below.
Table-2: Correlation Matrix of the Continuous Variables in t

2/17/2014
Explor e Ir is Dataset
Explore Iris Dataset
Outline of this lab
1. Install Rattle
2. Exploration and basic visualization using R/Rattle
3. Start with Case #1
Iris dataset - Introduction
The dataset consists of 50 samples from each of three speci

2/17/2014
Tr ee M odels
Tree Models
In this lab we will go through the model building, validation, and interpretation of tree models.
The focus will be on rpart package.
Regression Tree v s. Classification Tree
CART stands for classification and regressio

2/17/2014
Intr oduction to R
Introduction to R
Outline of this lab
1. Install R and RStudio
2. Learn basic commands and data types in R
3. Get started with Rattle
Before You Start: Install R, RStudio
R
Windows: http:/cran.r-project.org/bin/windows/base/
M

2/17/2014
Log istic Reg r ession, Pr ediction and ROC
Logistic Regression, Prediction and
ROC
The objective of this case is to get you understand logistic regression (binary classification)
and some important ideas such as cross validation, ROC curve, cut

2/17/2014
Reg r ession and Var iable Selection
Regression and Variable Selection
The objective of this case is to get you started with regression model building, variable
selection, and model evaluation in R.
Code in this file is not the only correct way

Introduction to R package
1
Why R?
free !
available for Unix, Linux and Windows
Nice plots
2
What is R?
A programming language for statistical data analysis
called S developed at Bell Labs. Later extended to S+.
R is a free, open source version of S/S+.
U

Dr. Yan Yu
Linder 527
(513)556-7147
[email protected]
http:/business.uc.edu/departments/obais/faculty/yan-yu.html
Office Hours:
M, (1st 7 weeks) 4pm-5pm and by appointment
H, (2nd 7 weeks) 4pm-5pm and by appointment
Professor, OBAIS
UC 2000-Present
Ph.D.

Application of Multivariate Adaptive Regression Splines (MARS)
in
Direct Marketing
A Project Submitted in Partial Fulfillment
for the Requirement of the Degree of
Master of Science in Business Administration
with a Concentration in Quantitative Analysis
Y

Linear Regression
Quick Overview
and a Case Study
1
Introduction
What is Linear Regression?
=X+error
y : Dependent variable
x : Independent variable
Why the funny name?
Sir Francis Galton
Regression towards Mediocrity in Hereditary Stature
Journal of the