DS502/MA543: Special topics Prof. Randy Paffenroth [email protected] Worcester Polytechnic Institute

Announcements – Schedule for final weeks of class 4/13 (Thursday): Lecture (today!), HW5 due 4/18 (Tuesday): Projects due and project presentations 4/20 (Thursday): Project Presentations 4/25 (Tuesday): Project Presentations 4/27 (Thursday): Final Review 5/2 (Tuesday): Final Exam
Announcements Anything I can help with for the final project?

Clustering Review
Clustering What are clusters?

What is the most common clustering cost?
The total cost

Lloyd's algorithm
Much better with a picture. An Introduction to Statistical Learningwith Applications in R Series:  Springer Texts in Statistics James, G., Witten, D., Hastie, T., Tibshirani, R.2013, XIV, 426 p. 150 illus. Pg 389

Hierarchical clustering Dendograms! An Introduction to Statistical Learningwith Applications in R Series:  Springer Texts in Statistics James, G., Witten, D., Hastie, T., Tibshirani, R.2013, XIV, 426 p. 150 illus. Pg 389 Trick Question: How do you know where to cut?
How to merge groups? Complete: Maximal inter-cluster dissimilarity. (i.e., largest distance between clusters). Single: Minimal inter-cluster dissimilarity. (i.e., smallest distance between clusters). Average: Average inter-cluster dissimilarity. (i.e., average distance between clusters). Centroid: Distance between the centroids of the clusters.

What can go wrong with clustering? Choosing K, similarity, linkage, number of clusters, where to cut, etc.
