Reducing Number of Candidates
Apriori principle:
If an itemset is frequent, then all of its subsets must
also be frequent
Apriori principle holds due to the following
property of the support measure:
X ,Y : ( X Y ) s( X ) s(Y )
Support of an itemset n
Final Review
Lei Chen
Clustering Algorithms
K-Means
Partitioning Algorithms: Basic Concept
Partitioning method: Partitioning a database D of n objects into a set of k
clusters, such that the sum of squared distances is minimized (where ci is the
centroi
Density-Based Outlier Detection
Local outliers: Outliers comparing to their local
neighborhoods, instead of the global data
distribution
In Fig., o1 and o2 are local outliers to C1, o3 is a global
outlier, but o4 is not an outlier. However, proximitybas
MSCIT 5210: Knowledge
Discovery and Data Mining
Acknowledgement: Slides modified by Dr. Lei Chen based on
the slides provided by Jiawei Han, Micheline Kamber, and
Jian Pei
2012 Han, Kamber & Pei. All rights reserved.
1
1
Outline of Advanced Clustering An
MSCIT 5210: Knowledge
Discovery and Data Mining
Acknowledgement: Slides modified by Dr. Lei Chen based on
the slides provided by Jiawei Han, Micheline Kamber, and
Jian Pei
2012 Han, Kamber & Pei. All rights reserved.
1
Chapter 10. Cluster Analysis: Basic
DECISION TREE
An internal node represents a test on an attribute.
A branch represents an outcome of the test, e.g.,
Color=red.
A leaf node represents a class label or class label
distribution.
At each node, one attribute is chosen to split training
exampl
Correlation Analysis: ,
Covariance Analysis:
Min-Max Normalization:
Z-Score Normalization:
Normalization by Decimal Scaling: , j is the smallest integer such that .
by Information Gain (biased to multivalued attributes):
by Gain Ratio (preferring unbalanc
TheKMeansClusteringMethod:for
numericalattributes
n
Givenk,thekmeansalgorithmisimplementedinfour
steps:
n
n
n
n
Partitionobjectsintoknonemptysubsets
Computeseedpointsasthecentroidsofthe
clustersofthecurrentpartition(thecentroidisthe
center,i.e.,meanpoint,
MSCIT 5210: Knowledge
Discovery and Data Mining
Acknowledgement: Slides modified by Dr. Lei Chen based on
the slides provided by Jiawei Han, Micheline Kamber, and
Jian Pei
2012 Han, Kamber & Pei. All rights reserved.
1
1
Chapter 4: Data Warehousing, On-l
MSCIT 5210: Knowledge
Discovery and Data Mining
Acknowledgement: Slides modified by Dr. Lei Chen based on
the slides provided by Jiawei Han, Micheline Kamber, and
Jian Pei
2012 Han, Kamber & Pei. All rights reserved.
1
Chapter 2: Getting to Know Your Dat