Lect17 Clustering

An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
CSE182-L17 Clustering Population Genetics: Basics
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Unsupervised Clustering Given a set of points (in n-dimensions), and k, compute the k “best clusters”. In k-means, clustering is done by choosing k centers (means). Each point is assigned to the closest center. The notion of “best” is defined by distances to the center. Question: How can we compute the k best centers? Cluster s
Background image of page 2
Distance Given a data point v and a set of points X , define the distance from v to X d(v, X ) as the (Euclidean) distance from v to the closest point from X . Given a set of n data points V ={v 1 …v n } and a set of k points X , define the Squared Error Distortion d( V , X ) = d(v i , X ) 2 / n 1 < i < n v
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
K-Means Clustering Problem: Formulation Input : A set, V , consisting of n points and a parameter k Output : A set X consisting of k points (cluster centers) that minimizes the squared error distortion d( V , X ) over all possible choices of X This problem is NP-complete in general.
Background image of page 4
1-Means Clustering Problem: an Easy Case Input : A set, V , consisting of n points. Output : A single point X that minimizes d( V , X ) over all possible choices of X. This problem is easy. However, it becomes very difficult for more than one center. An efficient heuristic method for k-Means clustering is the Lloyd algorithm
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
K-means: Lloyd’s algorithm Choose k centers at random: X’ = {x 1 ,x 2 ,x 3 ,…x k } Repeat X=X’ Assign each v V to the closest cluster j d(v,x j ) = d(v,X) C j= C j {v} Recompute X’ x’ j ( v Cj v) /| C j | until (X’ = X)
Background image of page 6
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Background image of page 8
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Background image of page 10
Conservative K-Means Algorithm Lloyd algorithm is fast but in each iteration it moves many data points, not necessarily causing better convergence. A more conservative method would be to move one point at a time only if it improves
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/14/2008 for the course CSE 182 taught by Professor Bafna during the Fall '06 term at UCSD.

Page1 / 38

Lect17 Clustering - CSE182-L17 Clustering Population...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online