Unformatted text preview:   idf affects the ranking of documents for queries with at least two terms   For the query capricious person, idf weigh*ng makes occurrences of capricious count for much more in the final document ranking than occurrences of person. 100,000 the Will turn out the base of the log is immaterial.   idf has no effect on ranking one term queries 10,000 under   We use log (N/dft) instead of N/dft to dampen the effect of idf.   iPhone 100 sunday   We define the idf (inverse document frequency) of t by idft = log10 ( N/dft )   Does idf have an effect on ranking for one ­term queries, like idft calpurnia   dft is an inverse measure of the informa*veness of t   dft ≤ N 1,000,000 idft = log10 ( N/dft ) There is one idf value for each term t in a collection. 22 Introduc)on to Informa)on Retrieval S ec. 6.2.1 Collec*on vs. Document frequency   The collec*on frequency of t is the number of occurrences of t in the collec*on, coun*ng mul*ple occurrences.   Example: Word Collection frequency insurance 10440 Document frequency 3997 10422 8760 try   Which word is a beqer search term (and should get a higher weight)? Introduc)on to Informa)on Retrieval Sec. 6.2.2 o ­idf weigh*ng   The o ­idf weight of a term is the product of its o weight and its idf weight. w = log(1 + tf ) × log ( N / df ) t ,d t ,d 10 t   Best known weigh*ng scheme in informa*on retrieval   Note: the  ­ in o ­idf is a hyphen, not a minus sign!   Alterna*ve names: o.idf, o x idf   Increases with the number of occurrences within a document   Increases with the rarity of the term in the collec*on 4 Introduc)on to Informa)on Retrieval Sec. 6.2.2 Score for a document given a query Score(q, d ) = $ Intro...
