{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lect17 Clustering

# An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

This preview shows pages 1–12. Sign up to view the full content.

CSE182-L17 Clustering Population Genetics: Basics

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Unsupervised Clustering Given a set of points (in n-dimensions), and k, compute thek “best clusters”. In k-means, clustering is done by choosing k centers (means). Each point is assigned to the closest center. The notion of “best” is defined by distances to the center. Question: How can we compute the k best centers? Cluster s
Distance Given a data point v and a set of points X , definethe distance from v to X d(v, X ) as the(Euclidean) distancefrom v to theclosest point from X . Given a set of n data points V ={v 1 …v n } and a set of k points X , definethe Squared Error Distortion d( V , X ) = d(v i , X ) 2 / n 1 < i < n v

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
K-Means Clustering Problem: Formulation Input : A set, V , consisting of n points and a parameter k Output : A set X consisting of k points (cluster centers) that minimizes thesquared error distortion d( V , X ) over all possible choices of X This problem is NP-completein general.
1-Means Clustering Problem: an Easy Case Input : A set, V , consisting of n points. Output : A single point X that minimizes d( V , X ) over all possiblechoices of X. This problem is easy. However, it becomes very difficult for morethan onecenter. An efficient heuristic method for k-Means clustering is theLloyd algorithm

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
K-means: Lloyd’s algorithm Choosek centers at random: X’ = {x 1 ,x 2 ,x 3 ,…x k } Repeat X=X’ Assign each v V to theclosest cluster j d(v,x j ) = d(v,X) C j= C j {v} Recompute X’ x’ j ( v Cj v) /| C j | until (X’ = X)
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Conservative K-Means Algorithm Lloyd algorithm is fast but in each iteration it moves many data points, not necessarily causing better convergence. A moreconservativemethod would beto moveonepoint at a timeonly if it improves theoverall clustering cost Thesmaller theclustering cost of a partition of data points is thebetter that clustering is Different methods can beused to measurethis clustering cost (for examplein the last algorithm thesquared error distortion was used)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}