Data Mining: Exploring Data
Lecture Notes for Chapter 3
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
8/05/2005
1
What is data exploration?
A preliminary exploration of the data to
better understand
Data Mining
Anomaly Detection
Lecture Notes for Chapter 10
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Anomaly/Outlier Detection
What are anomalies/outliers?
The set of data points th
Data Mining
Cluster Analysis: Advanced Concepts
and Algorithms
Lecture Notes for Chapter 9
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Hierarchical Clustering: Revisited
Creates nested
Data Mining
Cluster Analysis: Basic Concepts
and Algorithms
Lecture Notes for Chapter 8
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
What is Cluster Analysis?
Finding groups of objects
Data Mining
Association Rules: Advanced Concepts
and Algorithms
Lecture Notes for Chapter 7
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Continuous and Categorical Attributes
How to app
Data Mining
Association Analysis: Basic Concepts
and Algorithms
Lecture Notes for Chapter 6
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Association Rule Mining
Given a set of transacti
Data Mining
Classification: Alternative Techniques
Lecture Notes for Chapter 5
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Rule-Based Classifier
Classify records by using a collection
Data Mining
Classification: Basic Concepts, Decision
Trees, and Model Evaluation
Lecture Notes for Chapter 4
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Classification: Definition
Give
Data Mining: Data
Lecture Notes for Chapter 2
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
What is Data?
Collection of data objects and
their attributes
Attributes
Tid Refund Marital
St
Data Mining: Introduction
Lecture Notes for Chapter 1
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Why Mine Data? Commercial Viewpoint
Lots of data is being collected
and warehoused
We
4
Classication:
Basic Concepts,
Decision Trees, and
Model Evaluation
Classication, which is the task of assigning objects to one of several predened
categories, is a pervasive problem that encompasses many diverse applications.
Examples include detecting
TURBMW06_013234761X.QXD
3/7/07
8:07 PM
Page 1
ONLINE CHAPTER
Neural Networks 6 for Data Mining
Learning Objectives Understand the concept and different types of artificial neural networks (ANN) Learn the advantages and limitations of ANN Understand how ba
Data Mining
Case 1 : Improving
Direct Mail Responses
Mellon Bank Corporation is a major financial
services company headquartered in Pittsburgh,
Pennsylvania. Its two core businesses are
investment services and banking services.
Objective : to predict th
Data Mining
C. Decision Tree Models
ID3 (by Quinlan in 1979)
C4.5 (by Quinlan in 1993)
CART (by Breiman 1984)
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan,
CUHK
57
Data Mining
Iterative Dichotomiser 3 - ID3
ID3 is a very basic decision tree
Data Mining
Decision Tree
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan,
CUHK
1
Data Mining
Outline
A. Introduction to Decision Tree ?
B. Measurement of impurity
C. Decision Tree Models and their variations
(Tan, Steinbach and Kumar, Chapter 4.3)
Data Mining
First term, 11/12
Tree Pruning
CSC5180 Lecture Slides, written by
Laiwan Chan, CUHK
1
Data Mining
Overfit
A decision tree, d, is said to overfit the
training data if there exists some tree ds
which is a simplification of d, such that d
has sm
Data Mining
Neural Networks
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan, CUHK
1
Data Mining
Outline
(Chapter 5.4)
A. What is a neural network ?
B. Model and operation of a neuron
C. Single layer perceptron Multi-layer
perceptron
First term, 11/
Data Mining
Customer
Ranking Model
Objective :
In a business with a large customer base, we
want to rank the existing customer set based
on a set of parameters that defines what a
good customer means
First term, 11/12
CSCI5180 Lecture Slides, written by
Data Mining
Model Evaluation
Metrics for Performance Evaluation (Chapter 5.7)
How to evaluate the performance of a model?
Methods for Performance Evaluation (Chapter 4.5)
How to obtain reliable estimates?
Methods for Model Comparison (Chapter 4.6)
H
Data Mining
Linear Discriminant
Classifiers
The discriminant functions are linear in the
components of x or linear in some given set
of functions of x.
g(x) = WT x + Wo
where W = weight vector,
Wo = threshold value
The decision rule :
x is assigned to
Data Mining
Classification
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan,
CUHK
1
Data Mining
Classification
Definition :
Classification is the process of assigning
classes, or categories, to observations.
An observation has a set of attributes,
Data Mining
Market Segmentation
An example appeared in Chapter 9 of Data Mining
with Neural Networks by Joseph P. Bigus.
Objective : To understand the customers, to find
out the typical customers and to know what
features/products are customers most int
Mutual Fund Selection
Data Mining
A mutual fund is a form of collective investment that
pools money from many investors and invests their money
with a predetermined investment objective. The fund
manager of the mutual fund trades the fund's underlying
se
Data Mining
4.3. Hierarchical Clustering
Methods
The partition of data is not done at a
single step.
Produces a set of nested clusters
organized as a hierarchical tree
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan, CUHK
133
4.3. Hierarchical Cl
Data Mining
3. Types of Clusters
Well-separated clusters
Center-based clusters
Graph-Based clusters
Density-based clusters
Property or Conceptual
Described by an Objective Function
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan, CUHK
55
Data
Data Mining
Clustering
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan, CUHK
1
Data Mining
Summary
1. Introduction
2. Types of Data and Similarity Measurements
- criteria used in clustering (Chapters 2.1
and 2.4)
3. Types of clusters (Chapter 8.1)