IR-part2

# 622 final ranking of documents for a query scoreq

Unformatted text preview: ’t §༊  But it’s not a sure indicator of relevance. §༊  → For frequent terms, we want posi*ve weights for words like high, increase, and line §༊  But lower weights than for rare terms. §༊  We will use document frequency (df) to capture this. Introduc)on to Informa)on Retrieval S ec. 6.2.1 idf weight §༊  dft is the document frequency of t: the number of documents that contain t §༊  dft is an inverse measure of the informa*veness of t §༊  dft ≤ N §༊  We deﬁne the idf (inverse document frequency) of t by idft = log10 ( N/dft ) §༊  We use log (N/dft) instead of N/dft to “dampen” the eﬀect of idf. Will turn out the base of the log is immaterial. Introduc)on to Informa)on Retrieval S ec. 6.2.1 idf example, suppose N = 1 million term dft idft calpurnia 1 animal 100 sunday 1,000 fly 10,000 under the 100,000 1,000,000 idft = log10 ( N/dft ) There is one idf value for each term t in a collection. Introduc)on to I...
