IR-part2

# 622 final ranking of documents for a query scoreq

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ’t §༊  But it’s not a sure indicator of relevance. §༊  → For frequent terms, we want posi*ve weights for words like high, increase, and line §༊  But lower weights than for rare terms. §༊  We will use document frequency (df) to capture this. Introduc)on to Informa)on Retrieval S ec. 6.2.1 idf weight §༊  dft is the document frequency of t: the number of documents that contain t §༊  dft is an inverse measure of the informa*veness of t §༊  dft ≤ N §༊  We deﬁne the idf (inverse document frequency) of t by idft = log10 ( N/dft ) §༊  We use log (N/dft) instead of N/dft to “dampen” the eﬀect of idf. Will turn out the base of the log is immaterial. Introduc)on to Informa)on Retrieval S ec. 6.2.1 idf example, suppose N = 1 million term dft idft calpurnia 1 animal 100 sunday 1,000 fly 10,000 under the 100,000 1,000,000 idft = log10 ( N/dft ) There is one idf value for each term t in a collection. Introduc)on to I...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online