# FALLSEM2021-22_SWE2009_ETH_VL2021220101054_Reference_Material_I_12-10-2021_k-means.ppt

• 38

Course Hero uses AI to attempt to automatically extract content from documents to surface to you and others so you can study better, e.g., in search results, to enrich docs, and more. This preview shows page 1 - 11 out of 38 pages.

K-MEANSCLUSTERINGSWE2009 - Data Mining Techniques
INTRODUCTION-What is clustering?Clusteringis theclassificationof objects intodifferent groups, or more precisely, thepartitioningof adata setintosubsets(clusters), so that the data in each subset(ideally) share some common trait - oftenaccording to some defineddistance measure.SWE2009 - Data Mining Techniques
Types of clustering:1.Hierarchical algorithms: these find successive clustersusing previously established clusters.1. Agglomerative ("bottom-up"): Agglomerativealgorithmsbegin with each element as a separate cluster andmergethem into successively larger clusters.2. Divisive ("top-down"): Divisive algorithmsbegin withthe whole set and proceed to divide it into successivelysmallerclusters.2. Partitional clustering:Partitional algorithms determine allclusters at once.They include:K-means and derivativesFuzzyc-means clusteringQT clustering algorithmSWE2009 - Data Mining Techniques
Common Distance measures:Distance measurewill determine how thesimilarityof twoelements is calculated and it will influence the shape of theclusters.They include:1. TheEuclidean distance(also called 2-norm distance) isgiven by:2. TheManhattan distance(also called taxicab norm or 1-norm) is given by:SWE2009 - Data Mining Techniques
K-MEANS CLUSTERINGThek-means algorithmis an algorithm toclusternobjects based on attributes intokpartitions,wherek<n.It is similar to the expectation-maximizationalgorithm for mixtures of Gaussians in that theyboth attempt to find the centers of natural clustersin the data.It assumes that the object attributes form a vectorspace.SWE2009 - Data Mining Techniques
An algorithm for partitioning (or clustering) Ndata points into K disjoint subsets Sjcontaining data points so as to minimize thesum-of-squares criterionwhere xnis a vector representing the the nthdata point and ujis the geometric centroid ofthe data points in Sj.SWE2009 - Data Mining Techniques
Simply speaking k-means clustering is analgorithm to classify or to group the objectsbased on attributes/features into K number ofgroup.K is positive integer number.The grouping is done by minimizing the sumof squares of distances between data and thecorresponding cluster centroid.SWE2009 - Data Mining Techniques
How the K-Mean Clusteringalgorithm works?
SWE2009 - Data Mining TechniquesTheK-MeansClusteringMethodGivenk, thek-meansalgorithm is implemented in foursteps:Partition objects intoknonempty subsetsCompute seed points as the centroids of the clusters of thecurrent partition (the centroid is the center, i.e.,mean point, of thecluster)Assign each object to the cluster with the nearest seed pointGo back to Step 2, stop when no more new assignment
SWE2009 - Data Mining TechniquesTheK-MeansClusteringMethod012345678910012345678910012345678910012345678910012345678910012345678910012345678910012345678910012345678910012345678910K=2Arbitrarily choose

Course Hero member to access this document

Course Hero member to access this document

End of preview. Want to read all 38 pages?

Course Hero member to access this document

Term
Summer
Professor
NoProfessor
Tags
Distance, K means clustering