Unformatted text preview:   idf aﬀects the ranking of documents for queries with at least two terms   For the query capricious person, idf weigh*ng makes occurrences of capricious count for much more in the ﬁnal document ranking than occurrences of person. 100,000 the Will turn out the base of the log is immaterial.   idf has no eﬀect on ranking one term queries 10,000 under   We use log (N/dft) instead of N/dft to dampen the eﬀect of idf.   iPhone 100 sunday   We deﬁne the idf (inverse document frequency) of t by idft = log10 ( N/dft )   Does idf have an eﬀect on ranking for one ­term queries, like idft calpurnia   dft is an inverse measure of the informa*veness of t   dft ≤ N 1,000,000 idft = log10 ( N/dft ) There is one idf value for each term t in a collection. 22 Introduc)on to Informa)on Retrieval S ec. 6.2.1 Collec*on vs. Document frequency   The collec*on frequency of t is the number of occurrences of t in the collec*on, coun*ng mul*ple occurrences.   Example: Word Collection frequency insurance 10440 Document frequency 3997 10422 8760 try   Which word is a beqer search term (and should get a higher weight)? Introduc)on to Informa)on Retrieval Sec. 6.2.2 o ­idf weigh*ng   The o ­idf weight of a term is the product of its o weight and its idf weight. w = log(1 + tf ) × log ( N / df ) t ,d t ,d 10 t   Best known weigh*ng scheme in informa*on retrieval   Note: the  ­ in o ­idf is a hyphen, not a minus sign!   Alterna*ve names: o.idf, o x idf   Increases with the number of occurrences within a document   Increases with the rarity of the term in the collec*on 4 Introduc)on to Informa)on Retrieval Sec. 6.2.2 Score for a document given a query Score(q, d ) = \$ Intro...
