Data Mining
Anomaly Detection
Lecture Notes for Chapter 10
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Anomaly/Outlier Detection
What are anomalies/outliers?
The set of data points th
Data Mining
Cluster Analysis: Advanced Concepts
and Algorithms
Lecture Notes for Chapter 9
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Hierarchical Clustering: Revisited
Creates nested
Data Mining
Cluster Analysis: Basic Concepts
and Algorithms
Lecture Notes for Chapter 8
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
What is Cluster Analysis?
Finding groups of objects
Data Mining
Association Rules: Advanced Concepts
and Algorithms
Lecture Notes for Chapter 7
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Continuous and Categorical Attributes
How to app
Data Mining
Association Analysis: Basic Concepts
and Algorithms
Lecture Notes for Chapter 6
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Association Rule Mining
Given a set of transacti
Data Mining
Classification: Alternative Techniques
Lecture Notes for Chapter 5
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Rule-Based Classifier
Classify records by using a collection
Data Mining
Classification: Basic Concepts, Decision
Trees, and Model Evaluation
Lecture Notes for Chapter 4
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Classification: Definition
Give
Data Mining: Exploring Data
Lecture Notes for Chapter 3
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
8/05/2005
1
What is data exploration?
A preliminary exploration of the data to
better understand
Data Mining: Data
Lecture Notes for Chapter 2
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
What is Data?
Collection of data objects and
their attributes
Attributes
Tid Refund Marital
St
Data Mining: Introduction
Lecture Notes for Chapter 1
Introduction to Data Mining
by
Tan, Steinbach, Kumar
Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Why Mine Data? Commercial Viewpoint
Lots of data is being collected
and warehoused
We
4
Classication:
Basic Concepts,
Decision Trees, and
Model Evaluation
Classication, which is the task of assigning objects to one of several predened
categories, is a pervasive problem that encompasses many diverse applications.
Examples include detecting
TURBMW06_013234761X.QXD
3/7/07
8:07 PM
Page 1
ONLINE CHAPTER
Neural Networks 6 for Data Mining
Learning Objectives Understand the concept and different types of artificial neural networks (ANN) Learn the advantages and limitations of ANN Understand how ba
Data Mining
Case 1 : Improving
Direct Mail Responses
Mellon Bank Corporation is a major financial
services company headquartered in Pittsburgh,
Pennsylvania. Its two core businesses are
investment services and banking services.
Objective : to predict th
Data Mining
C. Decision Tree Models
ID3 (by Quinlan in 1979)
C4.5 (by Quinlan in 1993)
CART (by Breiman 1984)
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan,
CUHK
57
Data Mining
Iterative Dichotomiser 3 - ID3
ID3 is a very basic decision tree
Data Mining
Decision Tree
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan,
CUHK
1
Data Mining
Outline
A. Introduction to Decision Tree ?
B. Measurement of impurity
C. Decision Tree Models and their variations
(Tan, Steinbach and Kumar, Chapter 4.3)
Data Mining
First term, 11/12
Tree Pruning
CSC5180 Lecture Slides, written by
Laiwan Chan, CUHK
1
Data Mining
Overfit
A decision tree, d, is said to overfit the
training data if there exists some tree ds
which is a simplification of d, such that d
has sm
Data Mining
Neural Networks
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan, CUHK
1
Data Mining
Outline
(Chapter 5.4)
A. What is a neural network ?
B. Model and operation of a neuron
C. Single layer perceptron Multi-layer
perceptron
First term, 11/
Data Mining
Customer
Ranking Model
Objective :
In a business with a large customer base, we
want to rank the existing customer set based
on a set of parameters that defines what a
good customer means
First term, 11/12
CSCI5180 Lecture Slides, written by
Data Mining
Model Evaluation
Metrics for Performance Evaluation (Chapter 5.7)
How to evaluate the performance of a model?
Methods for Performance Evaluation (Chapter 4.5)
How to obtain reliable estimates?
Methods for Model Comparison (Chapter 4.6)
H
Data Mining
Linear Discriminant
Classifiers
The discriminant functions are linear in the
components of x or linear in some given set
of functions of x.
g(x) = WT x + Wo
where W = weight vector,
Wo = threshold value
The decision rule :
x is assigned to
Data Mining
Classification
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan,
CUHK
1
Data Mining
Classification
Definition :
Classification is the process of assigning
classes, or categories, to observations.
An observation has a set of attributes,
Data Mining
Market Segmentation
An example appeared in Chapter 9 of Data Mining
with Neural Networks by Joseph P. Bigus.
Objective : To understand the customers, to find
out the typical customers and to know what
features/products are customers most int
Mutual Fund Selection
Data Mining
A mutual fund is a form of collective investment that
pools money from many investors and invests their money
with a predetermined investment objective. The fund
manager of the mutual fund trades the fund's underlying
se
Data Mining
4.3. Hierarchical Clustering
Methods
The partition of data is not done at a
single step.
Produces a set of nested clusters
organized as a hierarchical tree
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan, CUHK
133
4.3. Hierarchical Cl
Data Mining
3. Types of Clusters
Well-separated clusters
Center-based clusters
Graph-Based clusters
Density-based clusters
Property or Conceptual
Described by an Objective Function
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan, CUHK
55
Data
Data Mining
Clustering
First term, 11/12
CSCI5180 Lecture Slides, Laiwan Chan, CUHK
1
Data Mining
Summary
1. Introduction
2. Types of Data and Similarity Measurements
- criteria used in clustering (Chapters 2.1
and 2.4)
3. Types of clusters (Chapter 8.1)