hw3 - CSE 5800 Mining/Learning and the Internet—HW3 Due...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSE 5800 Mining/Learning and the Internet—HW3 Due Oct 26, Wed, 6:30pm Submit Server: course=cse5800 , project=hw3 1. Implement these clustering algorithms: (a) K-means (b) Bisecting K-means with largest cluster to split (c) Bisecting K-means with least overall similarity to split (d) Aggolermerative Hierarchical Clustering with Intra-Cluster Similarity technique (IST) (e) Aggolermerative Hierarchical Clustering with Centroid Similarity technique (CST) (f) Aggolermerative Hierarchical Clustering with UPGMA (g) Aggolermerative Hierarchical Clustering with UPGMA to seed K-means 2. Each document is represented by a TF-IDF unit vec- tor, each component is: tf i × idf i , where: • tf i is the frequency of term i in the document divided by the total number of terms in the doc- ument and • idf i = log( D/df i ), where df i is the number of documents that contain term i and D is the total number of documents • to get a unit vector, divide each component by the magnitude of the vector 3. Allow these parameters:3....
View Full Document

This note was uploaded on 02/10/2012 for the course CSE 5800 taught by Professor Staff during the Fall '09 term at FIT.

Ask a homework question - tutors are online