clustering

clustering - Chapter 4: Unsupervised Learning Most slides...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
Chapter 4: Unsupervised Learning Most slides courtesy Bing Liu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Road map Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use? Cluster evaluation Discovering holes and data regions Summary
Background image of page 2
3 Supervised learning vs. unsupervised learning Supervised learning : discover patterns in the data that relate data attributes with a target (class) attribute. These patterns are then utilized to predict the values of the target attribute in future data instances. Unsupervised learning : The data have no target attribute. We want to explore the data to find some intrinsic structures in them.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 Clustering Clustering is a technique for finding similarity groups in data, called clusters . I.e., it groups data instances that are similar to (near) each other in one cluster and data instances that are very different (far away) from each other into different clusters. Clustering is often called an unsupervised learning task as no class values denoting an a priori grouping of the data instances are given, which is the case in supervised learning. Due to historical reasons, clustering is often considered synonymous with unsupervised learning. In fact, association rule mining is also unsupervised This chapter focuses on clustering.
Background image of page 4
5 An illustration The data set has three natural groups of data points, i.e., 3 natural clusters.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 What is clustering for? Let us see some real-life examples Example 1 : groups people of similar sizes together to make “small”, “medium” and “large” T-Shirts. Tailor-made for each person: too expensive One-size-fits-all: does not fit all. Example 2 : In marketing, segment customers according to their similarities To do targeted marketing.
Background image of page 6
7 What is clustering for? (cont…) Example 3 : Given a collection of text documents, we want to organize them according to their content similarities To produce a topic hierarchy In fact, clustering is one of the most utilized data mining techniques . It has a long history, and used in almost every field, e.g., medicine, psychology, sociology, biology, archeology, marketing, insurance, libraries, etc. In recent years, due to the rapid increase of online documents, text clustering becomes important.
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8 Aspects of clustering A clustering algorithm Partitional clustering Hierarchical clustering A distance (similarity, or dissimilarity) function Clustering quality Inter-clusters distance maximized Intra-clusters distance minimized The quality of a clustering result depends on the algorithm, the distance function, and the application.
Background image of page 8
Road map Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 70

clustering - Chapter 4: Unsupervised Learning Most slides...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online