cs201_hw1 - CS 201, Fall 2010 Homework Assignment 1 Due:...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 201, Fall 2010 Homework Assignment 1 Due: 18:00, November 1, 2010 In this homework, you will implement a simple search engine system. The search engine system will consist of a document set, a dictionary file ( dictionaryFileName ) including all possible words in the document set, and a file ( fileListFileName ) which contains the names of the files in the document set. The document set consists of a set of text files where each file may contain several words in each line. The dictionary file contains a single word in each line. The file that contains the names of the files in the document set include a single filename in each line. The documents will be modeled using word histogram vectors. Each element of the histogram vector for a particular document will store the number of times a word in the dictionary occurs in that document. If the number of words in the dictionary is numOfWords , the word histogram vector for each document will have numOfWords elements. The search engine system will have the following functionalities. The details of these functionalities are given below: 1. Find documents by a single word 2. Find documents by a list of words 3. Find similar documents 4. Find intersection of two documents Find documents by word: The system will allow the user to search for a particular word in the document set. The system will determine which documents contain the input word and will display on the screen all those documents (i.e., filenames of the documents). If the input word is not found in any document, the system will display the message “Not found”. Find documents by a list of words: The system will allow the user to search for a list of words in the document set. The system will find the documents containing all of the input words and will display those documents on the screen. If no document is found containing all of the input words, a message will be displayed to inform the user: “Not found”. Find similar documents: The system will allow the user to find similar documents to a given doc- ument specified with a file name. The system will compute the dissimilarity of each document in the system to the input document, and will display the documents whose dissimilarity values are less than a given threshold. In order to compute how dissimilar two documents are, you will compute the Euclidean
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/23/2011 for the course ENGINEERIN 102 taught by Professor Pablo during the Spring '11 term at Bilkent University.

Page1 / 5

cs201_hw1 - CS 201, Fall 2010 Homework Assignment 1 Due:...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online