Clustering Data Mining Prof. Dawn Woodard School of ORIE Cornell University 1 Outline 1 Announcements 2 Unsupervised Learning 3 Clustering 2 Announcements Questions? 4 Learning Supervised Learning : Observe pairs ( X , Y ) where X can be multi-dimensional Learn to estimate Y from X Unsupervised Learning : Observe vectors X Find “structure” in X I.e. find a low-dimensional summary of X 6
Unsupervised Learning Two types: 1. Find groups (clusters) with most of the observations concentrated in these clusters <draw> 2. Find a lower-dimensional manifold such that most of the probability is near this manifold Called “dimension reduction” <draw> E.g., “principal components” 7 Unsupervised Learning Clustering : Can use the resulting clusters for, e.g. targeted marketing Can use the clusters for regression, i.e. fit a separate regression for each cluster A linear regression model may hold within each cluster, but the parameters β may be different for each cluster 8 Unsupervised Learning Principal Components :
