{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

dm5part2 - University of Florida CISE department Clustering...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
University of Florida CISE department Gator Engineering Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Partitional Clustering Original Points A Partitional Clustering
Background image of page 2
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Hierarchical Clustering Traditional Hierarchical Clustering Traditional Dendrogram
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Characteristics of Clustering Algorithms • Type of clustering the algorithm produces: – Partitional versus hierarchical – Overlapping versus non-overlapping – Fuzzy versus non-fuzzy – Complete versus partial
Background image of page 4
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Characteristics of Clustering Algorithms • Type of clusters the algorithm seeks: – Well-separated, center-based, density-based or contiguity-based – Are the clusters found in the entire space or in a subspace – Are the clusters relatively similar to one another, or are they of differing sizes, shapes and densities
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Characteristics of Clustering Algorithms Type of data the algorithm can handle: Some clustering algorithms need a data matrix • The K-means algorithm assumes that it is meaningful to take the mean (average) of a set of data objects. • This makes sense for data that has continuous attributes and for document data, but not for record data that has categorical attributes. Some clustering algorithms start from a proximity matrix • Typically assume symmetry Does the data have noise and outliers? Is the data high dimensional?
Background image of page 6
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Characteristics of Clustering Algorithms How the algorithm operates: Minimizing or maximizing a global objective function. • Enumerate all possible ways of dividing the points into clusters and evaluate the ‘goodness’ of each potential set of clusters by using the given objective function. (NP Hard) • Can have global or local objectives. Hierarchical clustering algorithms typically have local objectives Partitional algorithms typically have global objectives A variation of the global objective function approach is to fit the data to a parameterized model.
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}