EECS 281 Project 5: More clustering!
Please note that you are responsible for any "sticky" phorum threads as they are an extension of this
document and may contain corrections or clarifications.
Due
This project is due on April 3, 2008 at 3:00PM.
You also have 4 late days per semester, but a maximum
of 2 may be used on this project.
Therefore the autograder closes completely at April 5, 2008 at
3:00PM.
Introduction
In this project you will implement different algorithms for clustering.
You will read in files and compute a
similarity matrix as in project 4.
Then you will cluster these files into groups using a couple of different
methods.
The general objective of these algorithms will be to obtain K clusters such that the diameter
of the largest cluster is minimized.
The diameter of a group is greatest distance between two files in
that group.
Intuitively, we want to grow very small clusters and discourage bloating clusters that have
gotten large.
Program Behavior
Your Makefile should produce an executable called
p5exe
which takes exactly 4 arguments.
./p5exe N K A dirname
Most of these arguments are the same as in project 4.
N is a frequency threshold  you should
effectively ignore words appearing at least this many times in a particular file.
K is the number of
clusters that your algorithm should create.
A is either "1", "2", or "3", and dictates which clustering
algorithm you should use.
The final argument is the name of a single directory, which can be a relative
or absolute path.
You are guaranteed that the arguments we pass your program are valid.
GIVEN
You will be given the test case directories and corresponding similarity matrices.
This information will
be posted on CTools to ensure that your parsing and similarity calculations from project 4 are correct.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Winter '08
 Jag
 Greedy algorithm, similarity matrix, maximum cluster diameter, random cluster

Click to edit the document details