{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# dm5part2 - University of Florida CISE department Clustering...

This preview shows pages 1–8. Sign up to view the full content.

University of Florida CISE department Gator Engineering Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Partitional Clustering Original Points A Partitional Clustering
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Hierarchical Clustering Traditional Hierarchical Clustering Traditional Dendrogram

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Characteristics of Clustering Algorithms • Type of clustering the algorithm produces: – Partitional versus hierarchical – Overlapping versus non-overlapping – Fuzzy versus non-fuzzy – Complete versus partial
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Characteristics of Clustering Algorithms • Type of clusters the algorithm seeks: – Well-separated, center-based, density-based or contiguity-based – Are the clusters found in the entire space or in a subspace – Are the clusters relatively similar to one another, or are they of differing sizes, shapes and densities

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Characteristics of Clustering Algorithms Type of data the algorithm can handle: Some clustering algorithms need a data matrix • The K-means algorithm assumes that it is meaningful to take the mean (average) of a set of data objects. • This makes sense for data that has continuous attributes and for document data, but not for record data that has categorical attributes. Some clustering algorithms start from a proximity matrix • Typically assume symmetry Does the data have noise and outliers? Is the data high dimensional?
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Characteristics of Clustering Algorithms How the algorithm operates: Minimizing or maximizing a global objective function. • Enumerate all possible ways of dividing the points into clusters and evaluate the ‘goodness’ of each potential set of clusters by using the given objective function. (NP Hard) • Can have global or local objectives. Hierarchical clustering algorithms typically have local objectives Partitional algorithms typically have global objectives A variation of the global objective function approach is to fit the data to a parameterized model.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}