This preview shows page 1. Sign up to view the full content.
Unformatted text preview: on et al. 1975) but is now used also in several text mining approaches as
well as in most of the currently available document retrieval systems.
The vector space model represents documents as vectors in m-dimensional
space, i.e. each document d is described by a numerical feature vector w(d) =
( x (d, t1 ), . . . , x (d, tm )). Thus, documents can be compared by use of simple
vector operations and even queries can be performed by encoding the query
terms similar to the documents in a query vector. The query vector can then be
compared to each document and a result list can be obtained by ordering the
documents according to the computed similarity (Salton et al. 1994). The main
task of the vector space representation of documents is to ﬁnd an appropriate
encoding of the feature vector.
Each element of the vector usually represents a word (or a group of words) of
the document collection, i.e. the size of the vector is deﬁned by the number of
words (or groups of words) of the complete document collection. The simplest
way of document encoding is to use binary term vectors, i.e. a ve...
View Full Document
This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.
- Summer '11