Numerical Example
Objective of marketing company
Send mailers to customers who are likely to respond (High response rate!)
Decide which group of customers with high response rate using decision
tree model
Chapter Outcomes
Chapter 5 Linear and Logistic Regressions
After completing this chapter, you are able to
Understand how to build linear and logistic regression models
Chapter Outcomes
Chapter 3 Decision Tree
After completing this chapter, you are able to
Understand how a decision tree works
Understand how to build a decision tree via recursive partitioning
Chapter Outcomes
After completing this chapter, you are able to
Chapter 1 Introduction to Data Mining
Know what data mining (DM) is.
Know the driving forces of DM.
Understand what DM emphasizes.
Notations
Denote xijc as the ith observation of input variable j in cluster c.
The trace of total CSSP matrix T for x is given by
trace(T ) =
Appendix: ANOVA-type Statistics in Hierarchical Clustering
p
nc X
K X
X
(xijc x
Lab 3: Build a Decision Tree without using SAS EM
DATA
Given the following training data set.
Sex
M
M
F
F
F
M
F
M
F
M
M
F
F
F
Age
25
45
40
35
25
45
33
20
30
35
25
Credit Scoring and Credit Scorecards
What is Credit Scoring?
A statistical model that assigns a risk value to prospective or existing
credit accounts.
Chapter 6 Credit Scoring
What is a Credit Scorecard?
Chapter Outcomes
After completing this chapter, you are able to
Chapter 8 Neural Networks
Understand the relation between a generalized linear model and a neural
network
Train a neural network
Lab 2: Data Preparation Example: Transformation, Imputation and Regression
DATA
A data set is obtained from a national veterans organization on people who have made

Principal Components Analysis (PCA)
Widely used in dimension reduction, feature extraction and data
visualization
Appendix: Principal Components Analysis (PCA)
Two equivalent formulations
Maximum variance formulation
Lab 6: Build a Logistic Regression in SAS EM
INTRODUCTION
We will study the Excel data set pva97nk.xls (pva97nk.csv) that was used in earlier
Lab 5: Build a Linear Regression in SAS EM
DATA
Here, we will study the Excel data set pva97nk.xls (pva97nk.csv) that was used in earlier
Lab 8: Perform Cluster Analysis in SAS EM
INTRODUCTION
A catalog company periodically purchases lists of prospects from outside sources. They want to
Lab 7: Build a Credit Scoring Model in SAS EM
DATA
Here, the SAS data set accepts.sas7bdat contains 5,837 credit applicant observations and 22
Numerical Example
Consider the following example of taxpayers again.
Tid Refund
Calculation of Predicted Probability Under a Decision Tree Model
1
2
3
4
5
6
7
8
9
10
Lab 1: Import Data into SAS and Create a SAS EM Project
1.1 Create a SAS data set
There are 3 ways to create a SAS data set:
1. Data Step
2. Viewtable
STAT2312/STAT3612 Data Mining (2015-16 Semester 2)
Suggested Solution of Assignment 3
Question 1
(a) 4 clusters are formed using Wards clustering method and standard deviation is used standardization.
Alternatively, when the Euclidean (centroid) clusterin

Chapter Outcomes
After completing this chapter, you are able to
Chapter 2 Data Preparation
Understand why we need data preparation
Understand dierent types of data
Know major tasks in data preparation
STAT2312/STAT3612 Data Mining (2015-16 Semester 2)
Suggested Solution of Assignment 2
Question 1
(a) From the Output Window, the variables INCOME, CCAVG, CD_ACCOUNT, MORTGAGE, EDUCATION
and FAMILY showed significant associations with the target variable P

Chapter Outcomes
After completing this chapter, you are able to
Chapter 7B Cluster Analysis II
Apply two additional clustering methods
Density-Based Clustering Methods
Model-Based Clustering Methods
Numerical Example 1
Obs
1
2
3
4
5
6
7
8
9
10
Construction of Empirical ROC Curve
STAT2312/3612 Data Mining (2015-16 Semester 2)
Suggested Solution of Assignment 1
Question 1
(a) This is supervised data mining, because for the other firms bankrupt and non-bankrupt status
are known.
(b) This is unsupervised data mining because there is

STAT2312/STAT3612 Data Mining (2015-16 Semester 2)
Assignment 3
Assignment 3
You are required to submit the Assignment on or before 6 May, 2015.
1. (25%) The EXCEL file EastWestAirlines.xls contains information on nearly 4,000
passengers who belong to an airlines freq

STAT2312/STAT3612 Data Mining (2015-16 Semester 2)
Assignment 1
Assignment 1
You are required to submit the Assignment on or before 4 March, 2016.
1. (15%) Assume that data mining techniques are to be used in the following cases. Identify whether
the task required is

Autoencoder Neural Networks
x6
x6
Unsupervised data mining technique
x5
x5
x4
x4
Number of output units = Number of
input units, xs
x3
x3
x2
x2
x1
x1
STAT2312/STAT3612 Data Mining (2015-16 Semester 2)
Assignment 2
Assignment 2
You are required to submit the Assignment on or before 5 April, 2015.
1. (50%) Consider the data file BankLoan.csv which contains information on 5,000 loan applications.
The target is whether

Lab 10: Perform Classification using a Neural Network in SAS EM
INTRODUCTION
The buy.sas7bdat data set consists of 10,000 customers and whether or not they responde

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 4: Build a Decision Tree in SAS EM
DATA
Here, we will use the Excel data file pva97nk.xls (pva98nk.csv) used in Lab 2. Recall that this is a
data set obtained f

Chapter Outcomes
STAT2312/STAT3612 Data Mining
After completing this chapter, you are able to
Chapter 7A Cluster Analysis I
Apply two popular clustering methods
Partitional Clustering Methods
Hierarchical Clustering Methods
Department