STAT2312/STAT3612 Data Mining (2015-16 Semester 2)
Assignment 3
You are required to submit the Assignment on or before 6 May, 2015.
1. (25%) The EXCEL file EastWestAirlines.xls contains information on nearly 4,000
passengers who belong to an airlines freq

Chapter Outcomes
STAT2312/STAT3612 Data Mining
Chapter 5 Linear and Logistic Regressions
After completing this chapter, you are able to
Understand how to build linear and logistic regression models
Dr. Gilbert C.S. Lui
Understand how to do variable select

Chapter Outcomes
STAT2312/STAT3612 Data Mining
Chapter 3 Decision Tree
After completing this chapter, you are able to
Understand how a decision tree works
Dr. Gilbert C.S. Lui
Understand how to build a decision tree via recursive partitioning
Department o

Chapter Outcomes
STAT2312/STAT3612 Data Mining
After completing this chapter, you are able to
Chapter 1 Introduction to Data Mining
Know what data mining (DM) is.
Know the driving forces of DM.
Dr. Gilbert C.S. Lui
Understand what DM emphasizes.
Departmen

Notations
Denote xijc as the ith observation of input variable j in cluster c.
The trace of total CSSP matrix T for x is given by
STAT2312/STAT3612 Data Mining
trace(T ) =
Appendix: ANOVA-type Statistics in Hierarchical Clustering
p
nc X
K X
X
(xijc x
j

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 3: Build a Decision Tree without using SAS EM
DATA
Given the following training data set.
Sex
M
M
F
F
F
M
F
M
F
M
M
F
F
F
Age
25
45
40
35
25
45
33
20
30
35
25
3

Credit Scoring and Credit Scorecards
What is Credit Scoring?
STAT2312/STAT3612 Data Mining
A statistical model that assigns a risk value to prospective or existing
credit accounts.
Chapter 6 Credit Scoring
What is a Credit Scorecard?
Dr. Gilbert C.S. Lui

Chapter Outcomes
STAT2312/STAT3612 Data Mining
After completing this chapter, you are able to
Chapter 8 Neural Networks
Understand the relation between a generalized linear model and a neural
network
Dr. Gilbert C.S. Lui
Train a neural network
Department

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 2: Data Preparation Example: Transformation, Imputation and Regression
DATA
A data set is obtained from a national veterans organization on people who have made

Principal Components Analysis (PCA)
Widely used in dimension reduction, feature extraction and data
visualization
STAT2312/STAT3612 Data Mining
Appendix: Principal Components Analysis (PCA)
Two equivalent formulations
Maximum variance formulation
Dr. Gilb

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 6: Build a Logistic Regression in SAS EM
INTRODUCTION
We will study the Excel data set pva97nk.xls (pva97nk.csv) that was used in earlier
demonstration. Recall

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 5: Build a Linear Regression in SAS EM
DATA
Here, we will study the Excel data set pva97nk.xls (pva97nk.csv) that was used in earlier
demonstration. Recall that

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 8: Perform Cluster Analysis in SAS EM
INTRODUCTION
A catalog company periodically purchases lists of prospects from outside sources. They want to
design a test

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 7: Build a Credit Scoring Model in SAS EM
DATA
Here, the SAS data set accepts.sas7bdat contains 5,837 credit applicant observations and 22
characteristic variab

Numerical Example
Objective of marketing company
Send mailers to customers who are likely to respond (High response rate!)
STAT2312/STAT3612 Data Mining
Decide which group of customers with high response rate using decision
tree model
Numerical Illustrati

Numerical Example
Consider the following example of taxpayers again.
STAT2312/STAT3612 Data Mining
Tid Refund
Calculation of Predicted Probability Under a Decision Tree Model
1
2
3
4
5
6
7
8
9
10
Dr. Gilbert C.S. Lui
Department of Statistics and Actuarial

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 1: Import Data into SAS and Create a SAS EM Project
1.1 Create a SAS data set
There are 3 ways to create a SAS data set:
1. Data Step
2. Viewtable
3. SAS Import

Chapter Outcomes
After completing this chapter, you are able to
STAT2312/STAT3612 Data Mining
Chapter 2 Data Preparation
Understand why we need data preparation
Understand dierent types of data
Dr. Gilbert C.S. Lui
Know major tasks in data preparation
Dep

STAT2312/STAT3612 Data Mining (2015-16 Semester 2)
Suggested Solution of Assignment 2
Question 1
(a) From the Output Window, the variables INCOME, CCAVG, CD_ACCOUNT, MORTGAGE, EDUCATION
and FAMILY showed significant associations with the target variable P

Chapter Outcomes
STAT2312/STAT3612 Data Mining
After completing this chapter, you are able to
Chapter 7B Cluster Analysis II
Apply two additional clustering methods
Dr. Gilbert C.S. Lui
Density-Based Clustering Methods
Model-Based Clustering Methods
Depar

Numerical Example 1
Obs
1
2
3
4
5
6
7
8
9
10
STAT2312/STAT3612 Data Mining
Construction of Empirical ROC Curve
Dr. Gilbert C.S. Lui
Department of Statistics and Actuarial Science,
The University of Hong Kong
2015-2016 Semester 2
Dr. Gilbert C.S. Lui (HKU,

STAT2312/3612 Data Mining (2015-16 Semester 2)
Suggested Solution of Assignment 1
Question 1
(a) This is supervised data mining, because for the other firms bankrupt and non-bankrupt status
are known.
(b) This is unsupervised data mining because there is

STAT2312/STAT3612 Data Mining (2015-16 Semester 2)
Assignment 1
You are required to submit the Assignment on or before 4 March, 2016.
1. (15%) Assume that data mining techniques are to be used in the following cases. Identify whether
the task required is

Autoencoder Neural Networks
x6
x6
Unsupervised data mining technique
x5
x5
x4
x4
Number of output units = Number of
input units, xs
x3
x3
Dr. Gilbert C.S. Lui
x2
x2
Department of Statistics and Actuarial Science,
The University of Hong Kong
x1
x1
STAT2312

STAT2312/STAT3612 Data Mining (2015-16 Semester 2)
Assignment 2
You are required to submit the Assignment on or before 5 April, 2015.
1. (50%) Consider the data file BankLoan.csv which contains information on 5,000 loan applications.
The target is whether

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 10: Perform Classification using a Neural Network in SAS EM
INTRODUCTION
The buy.sas7bdat data set consists of 10,000 customers and whether or not they responde

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 4: Build a Decision Tree in SAS EM
DATA
Here, we will use the Excel data file pva97nk.xls (pva98nk.csv) used in Lab 2. Recall that this is a
data set obtained f

Chapter Outcomes
STAT2312/STAT3612 Data Mining
After completing this chapter, you are able to
Chapter 7A Cluster Analysis I
Apply two popular clustering methods
Partitional Clustering Methods
Dr. Gilbert C.S. Lui
Hierarchical Clustering Methods
Department

HKU Department of Statistics and Actuarial Science (2015-16)
STAT2312/STAT3612 Data Mining
Lab 11: Forecast Time Series using a Neural Network in SAS EM
INTRODUCTION
The time series data wine1.sas7bdat consists of monthly sales of white wine and red wine