{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lect17 Clustering

An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
CSE182-L17 Clustering Population Genetics: Basics
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Unsupervised Clustering Given a set of points (in n-dimensions), and k, compute thek “best clusters”. In k-means, clustering is done by choosing k centers (means). Each point is assigned to the closest center. The notion of “best” is defined by distances to the center. Question: How can we compute the k best centers? Cluster s
Background image of page 2
Distance Given a data point v and a set of points X , definethe distance from v to X d(v, X ) as the(Euclidean) distancefrom v to theclosest point from X . Given a set of n data points V ={v 1 …v n } and a set of k points X , definethe Squared Error Distortion d( V , X ) = d(v i , X ) 2 / n 1 < i < n v
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
K-Means Clustering Problem: Formulation Input : A set, V , consisting of n points and a parameter k Output : A set X consisting of k points (cluster centers) that minimizes thesquared error distortion d( V , X ) over all possible choices of X This problem is NP-completein general.
Background image of page 4
1-Means Clustering Problem: an Easy Case Input : A set, V , consisting of n points. Output : A single point X that minimizes d( V , X ) over all possiblechoices of X. This problem is easy. However, it becomes very difficult for morethan onecenter. An efficient heuristic method for k-Means clustering is theLloyd algorithm
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
K-means: Lloyd’s algorithm Choosek centers at random: X’ = {x 1 ,x 2 ,x 3 ,…x k } Repeat X=X’ Assign each v V to theclosest cluster j d(v,x j ) = d(v,X) C j= C j {v} Recompute X’ x’ j ( v Cj v) /| C j | until (X’ = X)
Background image of page 6
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Background image of page 8
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 x 1 x 2 x 3
Background image of page 10
Conservative K-Means Algorithm Lloyd algorithm is fast but in each iteration it moves many data points, not necessarily causing better convergence. A moreconservativemethod would beto moveonepoint at a timeonly if it improves theoverall clustering cost Thesmaller theclustering cost of a partition of data points is thebetter that clustering is Different methods can beused to measurethis clustering cost (for examplein the last algorithm thesquared error distortion was used)
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}