Unformatted text preview: CSE 5800 Mining/Learning and the Internet—HW3 Due Oct 26, Wed, 6:30pm Submit Server: course=cse5800 , project=hw3 1. Implement these clustering algorithms: (a) K-means (b) Bisecting K-means with largest cluster to split (c) Bisecting K-means with least overall similarity to split (d) Aggolermerative Hierarchical Clustering with Intra-Cluster Similarity technique (IST) (e) Aggolermerative Hierarchical Clustering with Centroid Similarity technique (CST) (f) Aggolermerative Hierarchical Clustering with UPGMA (g) Aggolermerative Hierarchical Clustering with UPGMA to seed K-means 2. Each document is represented by a TF-IDF unit vec- tor, each component is: tf i × idf i , where: • tf i is the frequency of term i in the document divided by the total number of terms in the doc- ument and • idf i = log( D/df i ), where df i is the number of documents that contain term i and D is the total number of documents • to get a unit vector, divide each component by the magnitude of the vector 3. Allow these parameters:3....
View Full Document
This note was uploaded on 02/10/2012 for the course CSE 5800 taught by Professor Staff during the Fall '09 term at FIT.
- Fall '09