CS573: Homework 4 Due date: Tuesday November 16, start of class 1 Between cluster distances (4 pts) Let cluster C i contain n i samples, and let d ij be some measure of distance between two clusters C i and C j . In general, one might expect that if C i and C j are merged to form a new cluster C k , then the distance from C k to some other cluster C h is not simply related to d hi and d hj . However, consider the equation: d hk = α i d hi + α j d hj + βd ij + γ | d hi - d hj | Show that the following choices for the coefficients α i , α j , β, γ lead to the distance functions indicated. 1. Single-link: α i = α j = 0 . 5 , β = 0 , γ = - 0 . 5 2. Complete-link: α i = α j = 0 . 5 , β = 0 , γ = +0 . 5 3. Average-link: α i = n i n i + n j , α j = n j n i + n j , β = γ = 0 4. Between-cluster distance (i.e., squared Euclidean distance between centroids): α i = n i n i + n j , α j = n j n i + n j , β = - α i α j , γ = 0 2 Clustering theorem (4 pts) Read the paper: J. Kleinberg (2002). An Impossibility Theorem for Clustering. In Pro- ceedings of the 16th conference on Neural Information Processing Systems. Explain its main result. Is there a hope for providing a good clustering framework/algorithm? 3 Spectral Clustering (8 pts) In this problem we will analyze the operation of a variant of spectral clustering methods on two datasets shown in Figure 1. For each of the datasets (unless directed otherwise) please answer the following questions. 1. The first step is to build an affinity matrix. The matrix defines the degree of similarity between points.

