Lecture14 - Data Mining: Principles and Algorithms Jianyong...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
January 5, 2010 Data Mining: Principles and Algorithms 1 Data Mining: Pr inciples and Algorithms Jianyong Wang Database Lab, Institute of Software Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
January 5, 2010 Data Mining: Principles and Algorithms 2 Chapter 7. Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Methods Clustering High-Dimensional Data Constraint-Based Clustering Outlier Analysis Summary
Background image of page 2
January 5, 2010 Data Mining: Principles and Algorithms 3 Self-Organizing Feature Map (SOM) SOMs, also called topological ordered maps, or Kohonen Self-Organizing Feature Map (KSOMs) It maps all the points in a high-dimensional source space into a 2 to 3-d target space, s.t., the distance and proximity relationship (i.e., topology) are preserved as much as possible Clustering is performed by having several units competing for the current object - The unit whose weight (aka. reference) vector is closest to the current object wins - The winner and its neighbors learn by having their weights adjusted SOMs are believed to resemble processing that can occur in the brain Useful for visualizing high-dimensional data in 2- or 3-D space
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
January 5, 2010 Data Mining: Principles and Algorithms 4 Reference vector update Gaussian function Step function - r k is the closest reference to p(t) Self-Organizing Feature Map (SOM) The algorithm - Initialize the reference vectors - Repeat Select the next training sample (point) Determine the neuron whose reference vector is closest to the current point Update this reference vector and the reference vectors of all neurons which are close, i.e., in a specified neighborhood - Until the training is complete - When the training is complete, assign all points to the closest reference vectors and return the reference vectors and clusters otherwise threshold r r if t t h e t t h k j j t r r j k j 0 , || || ) ( ) ( ) ( ) ( ))) ( 2 /( || || ( 2 2 (0,0) (0,1) (0,2) (1,0) (1,1) (1,2) (2,0) (2,1) (2,2) )) ( ) ( )( ( ) ( ) 1 ( t m t p t h t m t m j j j j
Background image of page 4
January 5, 2010 Data Mining: Principles and Algorithms 5 Web Document Clustering Using SOM The result of SOM clustering of 12088 Web articles Based on websom.hut.fi Web page Each white dot marks a map node. Color denotes the density or the clustering tendency of the documents. White areas are clusters and dark areas empty space, "ravines", between the clusters. Automatically generated labels
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 53

Lecture14 - Data Mining: Principles and Algorithms Jianyong...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online