Introducon to informaon retrieval sec 622 oidf

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview:   idf affects the ranking of documents for queries with at least two terms   For the query capricious person, idf weigh*ng makes occurrences of capricious count for much more in the final document ranking than occurrences of person. 100,000 the Will turn out the base of the log is immaterial.   idf has no effect on ranking one term queries 10,000 under   We use log (N/dft) instead of N/dft to dampen the effect of idf.   iPhone 100 sunday   We define the idf (inverse document frequency) of t by idft = log10 ( N/dft )   Does idf have an effect on ranking for one ­term queries, like idft calpurnia   dft is an inverse measure of the informa*veness of t   dft ≤ N 1,000,000 idft = log10 ( N/dft ) There is one idf value for each term t in a collection. 22 Introduc)on to Informa)on Retrieval S ec. 6.2.1 Collec*on vs. Document frequency   The collec*on frequency of t is the number of occurrences of t in the collec*on, coun*ng mul*ple occurrences.   Example: Word Collection frequency insurance 10440 Document frequency 3997 10422 8760 try   Which word is a beqer search term (and should get a higher weight)? Introduc)on to Informa)on Retrieval Sec. 6.2.2 o ­idf weigh*ng   The o ­idf weight of a term is the product of its o weight and its idf weight. w = log(1 + tf ) × log ( N / df ) t ,d t ,d 10 t   Best known weigh*ng scheme in informa*on retrieval   Note: the  ­ in o ­idf is a hyphen, not a minus sign!   Alterna*ve names: o.idf, o x idf   Increases with the number of occurrences within a document   Increases with the rarity of the term in the collec*on 4 Introduc)on to Informa)on Retrieval Sec. 6.2.2 Score for a document given a query Score(q, d ) = $ Intro...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online