Lect17 Clustering

An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

This preview shows pages 1–12. Sign up to view the full content.

CSE182-L17 Clustering Population Genetics: Basics

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Unsupervised Clustering Given a set of points (in n-dimensions), and k, compute the k “best clusters”. In k-means, clustering is done by choosing k centers (means). Each point is assigned to the closest center. The notion of “best” is defined by distances to the center. Question: How can we compute the k best centers? Cluster s
Distance Given a data point v and a set of points X , define the distance from v to X d(v, X ) as the (Euclidean) distance from v to the closest point from X . Given a set of n data points V ={v 1 …v n } and a set of k points X , define the Squared Error Distortion d( V , X ) = d(v i , X ) 2 / n 1 < i < n v

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
K-Means Clustering Problem: Formulation Input : A set, V , consisting of n points and a parameter k Output : A set X consisting of k points (cluster centers) that minimizes the squared error distortion d( V , X ) over all possible choices of X This problem is NP-complete in general.
1-Means Clustering Problem: an Easy Case Input : A set, V , consisting of n points. Output : A single point X that minimizes d( V , X ) over all possible choices of X. This problem is easy. However, it becomes very difficult for more than one center. An efficient heuristic method for k-Means clustering is the Lloyd algorithm

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
K-means: Lloyd’s algorithm Choose k centers at random: X’ = {x 1 ,x 2 ,x 3 ,…x k } Repeat X=X’ Assign each v V to the closest cluster j d(v,x j ) = d(v,X) C j= C j {v} Recompute X’ x’ j ( v Cj v) /| C j | until (X’ = X)
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Conservative K-Means Algorithm Lloyd algorithm is fast but in each iteration it moves many data points, not necessarily causing better convergence. A more conservative method would be to move one point at a time only if it improves

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/14/2008 for the course CSE 182 taught by Professor Bafna during the Fall '06 term at UCSD.

Page1 / 38

Lect17 Clustering - CSE182-L17 Clustering Population...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online