To improve the performance usually term weighting

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ctor element is set to one if the corresponding word is used in the document and to zero if the word is not. This encoding will result in a simple Boolean comparison or search if a query is encoded in a vector. Using Boolean encoding the importance of all terms for a specific query or comparison is considered as similar. To improve the performance usually term weighting schemes are used, where the weights reflect the importance of a word in a specific document of the considered collection. Large weights are assigned to terms that are used frequently in Band 20 – 2005 27 Hotho, Nürnberger, and Paaß relevant documents but rarely in the whole document collection (Salton & Buckley 1988). Thus a weight w(d, t) for a term t in document d is computed by term frequency tf(d, t) times inverse document frequency idf(t), which describes the term specificity within the document collection. In Salton et al. (1994) a weighting scheme was proposed that has meanwhile proven its usability in practice. Besides term frequency and inverse document frequency — defined as id f (t) :=...
View Full Document

Ask a homework question - tutors are online