Application of Decision Trees in Credit Score Analysis
A project submitted
In partial fulfillment for the
Requirements for the Degree of
Master of Science in Business Analytics
in the department of
Operations, Business Analytics and Information Systems
of

Cross-validation for Generalized Linear Models in R
Description
This function calculates the estimated K-fold cross-validation prediction error for generalized
linear models.
Usage
cv.glm(data, glmfit, cost, K)
Arguments
data
A matrix or data frame contai

Cross-validation
Note to other teachers and users of
these slides. Andrew would be delighted
if you found this source material useful in
giving your own lectures. Feel free to use
these slides verbatim, or to modify them
to fit your own needs. PowerPoint

Linear Regression
Quick Overview
and a Case Study
1
Introduction
y : Dependent variable
x : Independent variable
2
Multiple Regression Model
yi = Value of the Dependent Variable in ith case
xi1 , , xi , p = Values of the p Independent Variables in ith cas

Generalized Linear Models (GLM)
g ( E (Y | X ) = g ( ) = X
Response variable from non-continuous distribution, e.g.
binomial or Poisson
Gaussian additive assumption is severely violated
GLM Reference:
Hosmer, D. and Lemeshow, S. (2000), Applied Logistic
R

Forecasting Default: A Comparison between Merton Model
and Logistic Model
A project submitted in partial fulfillment for the requirements for the master
degree of science in business administration with a concentration in quantitative
analysis
By
Man Xu
(

Application of Decision Trees in
Credit Score Analysis
1
Agenda
Introduction
Overview
Literature Review
Case Study
Introduction
Model Analysis
Model Validation
Conclusion and Discussion
Model Comparison
Conclusion
Questions
2
Introduction
Cred

A Study of Unsupervised Learning
A Project Submitted in Partial Fulfillment for the Requirements for
the Degree of Masters of Science in Quantitative Analysis
2006
Andrew R. Remington
Bachelor of Arts in Mathematics, University of Cincinnati, 2003
Committ

Successful Data Mining
in Practice:
Where do we Start?
Richard D. De Veaux
Department of
Mathematics and Statistics
Williams College
Williamstown MA, 01267
deveaux@williams.edu
http:/www.williams.edu/Mathematics/rdeveaux
JSM 2-day Course SF August 2-3, 20

22-BANA 7046 Data Mining I (2 cr.)
22-BANA 7047 Data Mining II (2 cr.)
Section 002 Day Class
Spring Semester, 2013-2014
Instructor:
Professor Yan YU
Office:
Office Hours:
Class Time:
Email:
Phone:
Web Page:
527 Lindner Hall
M, (1st 7 weeks) 4pm-5pm and by

My Experiences in Data Mining
Case Study, Practical Considerations, and Resources
Kristofer Still, Data Scientist and Manager, Analytics Unifund
My Experiences in Data Mining
Agenda
Introduction
Case Study
Top 10 Data Mining Mistakes
Data Mining Trends
Da

The ROC Curve
Cutoff Value
When using a continuous measure to
assign entities to (discrete) classes, we
need a cutoff value
With two categories and one measure, we
assign an entity to:
Category 0 if measure < cutoff
Category 1 if measure >= cutoff
Mis

Application of Multivariate Adaptive Regression Splines (MARS)
in
Direct Marketing
A Project Submitted in Partial Fulfillment
for the Requirement of the Degree of
Master of Science in Business Administration
with a Concentration in Quantitative Analysis
Y

BANA 7046 Data Mining I
(Section 002 - Day)
Assignment 4
Group #3
For the correlation of the continuous variables with each other, we present the correlation
matrix and scatterplot matrix below.
Table-2: Correlation Matrix of the Continuous Variables in t

2/17/2014
Explor e Ir is Dataset
Explore Iris Dataset
Outline of this lab
1. Install Rattle
2. Exploration and basic visualization using R/Rattle
3. Start with Case #1
Iris dataset - Introduction
The dataset consists of 50 samples from each of three speci

2/17/2014
Tr ee M odels
Tree Models
In this lab we will go through the model building, validation, and interpretation of tree models.
The focus will be on rpart package.
Regression Tree v s. Classification Tree
CART stands for classification and regressio

2/17/2014
Intr oduction to R
Introduction to R
Outline of this lab
1. Install R and RStudio
2. Learn basic commands and data types in R
3. Get started with Rattle
Before You Start: Install R, RStudio
R
Windows: http:/cran.r-project.org/bin/windows/base/
M

2/17/2014
Log istic Reg r ession, Pr ediction and ROC
Logistic Regression, Prediction and
ROC
The objective of this case is to get you understand logistic regression (binary classification)
and some important ideas such as cross validation, ROC curve, cut

2/17/2014
Reg r ession and Var iable Selection
Regression and Variable Selection
The objective of this case is to get you started with regression model building, variable
selection, and model evaluation in R.
Code in this file is not the only correct way

Introduction to R package
1
Why R?
free !
available for Unix, Linux and Windows
Nice plots
2
What is R?
A programming language for statistical data analysis
called S developed at Bell Labs. Later extended to S+.
R is a free, open source version of S/S+.
U

Dr. Yan Yu
Linder 527
(513)556-7147
Yan.Yu@uc.edu
http:/business.uc.edu/departments/obais/faculty/yan-yu.html
Office Hours:
M, (1st 7 weeks) 4pm-5pm and by appointment
H, (2nd 7 weeks) 4pm-5pm and by appointment
Professor, OBAIS
UC 2000-Present
Ph.D.

3. IRIS DATA
# Random sample a data set that contains 90% of original data points:
iris_sample_90 = iris[sample(x = nrow(iris), size = nrow(iris) * 0.9), ]
# 1. Summary of the data:
summary(iris)
Min
1st Qu
Median
Mean
3rd Qu
Max
Std. Dev.
Sepal.Length
4.