x d tm thus documents can be compared by use of

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: on et al. 1975) but is now used also in several text mining approaches as well as in most of the currently available document retrieval systems. The vector space model represents documents as vectors in m-dimensional space, i.e. each document d is described by a numerical feature vector w(d) = ( x (d, t1 ), . . . , x (d, tm )). Thus, documents can be compared by use of simple vector operations and even queries can be performed by encoding the query terms similar to the documents in a query vector. The query vector can then be compared to each document and a result list can be obtained by ordering the documents according to the computed similarity (Salton et al. 1994). The main task of the vector space representation of documents is to find an appropriate encoding of the feature vector. Each element of the vector usually represents a word (or a group of words) of the document collection, i.e. the size of the vector is defined by the number of words (or groups of words) of the complete document collection. The simplest way of document encoding is to use binary term vectors, i.e. a ve...
View Full Document

This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.

Ask a homework question - tutors are online