lecture6-tfidf-handout-6-per

G high increase line a document containing such a term

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: levant than a document with 1 occurrence of the term.   But not 10 *mes more relevant.   Relevance does not increase propor*onally with term frequency. NB: frequency = count in IR Introduc)on to Informa)on Retrieval Sec. 6.2.1 Document frequency   Rare terms are more informa*ve than frequent terms   Recall stop words   Consider a term in the query that is rare in the collec*on (e.g., arachnocentric)   A document containing this term is very likely to be relevant to the query arachnocentric   → We want a high weight for rare terms like arachnocentric.   The score is 0 if none of the query terms is present in the document. 3 Introduc)on to Informa)on Retrieval Sec. 6.2.1 Introduc)on to Informa)on Retrieval S ec. 6.2.1 Document frequency, con*nued idf weight   Frequent terms are less informa*ve than rare terms   Consider a query term that is frequent in the collec*on (e.g., high, increase, line)   A document containing such a term is more likely to be relevant than a document that doesn t   But it s not a sure indicator of relevance.   → For frequent terms, we want high posi*ve weights for words like high, increase, and line   But lower weights than for rare terms.   We will use document frequency (df) to capture this.   dft is the document frequency of t: the number of documents that contain t Introduc)on to Informa)on Retrieval S ec. 6.2.1 idf example, suppose N = 1 million term dft 1 animal 1,000 fly Introduc)on to Informa)on Retrieval Effect of idf on ranking...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online