lecture6-tfidf-handout-6-per

# Lecture6-tfidf-handout-6-per

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: measures how well document and query match . Introduc)on to Informa)on Retrieval Ch. 6 Query ­document matching scores Take 1: Jaccard coeﬃcient   We need a way of assigning a score to a query/ document pair   Let s start with a one ­term query   If the query term does not occur in the document: score should be 0   The more frequent the query term in the document, the higher the score (should be)   We will look at a number of alterna*ves for this.   Recall from Lecture 3: A commonly used measure of overlap of two sets A and B   jaccard(A,B) = |A ∩ B| / |A ∪ B|   jaccard(A,A) = 1   jaccard(A,B) = 0 if A ∩ B = 0   A and B don t have to be the same size.   Always assigns a number between 0 and 1. Introduc)on to Informa)on Retrieval Ch. 6 Introduc)on to Informa)on Retrieval Ch. 6 Jaccard coeﬃcient: Scoring example Issues with Jaccard for scoring   What is the query ­document match score that the Jaccard coeﬃcient computes for each of the two documents below?   Query: ides of march   Document 1: caesar died in march   Document 2: the long march   It doesn t consider term frequency (how many *mes a term occurs in a document)   Rare terms in a collec*on are more informa*ve than frequent terms. Jaccard doesn t consider this informa*on   We need a more sophis*cated way of normalizing for length   Later in this lecture, we ll use | A B | / | A B |   . . . instead of |A ∩ B|/|A ∪ B| (Jaccard) for length normaliza*on. 2 Introduc)on to Informa)on Retrieval S ec. 6.2 Recall (Lecture 1): Binary term ­ document incidence matrix Introduc)on to Informa)on Retrieval Sec. 6.2 Term ­document count matrices Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony 1 1 0 0 0 1 Brutus 1 1 0 1 0 0 Caesar 1 1 0 1 1 1 Calpurnia 0 1 0 0 0 0 Cleopatra 1 0 0 0 0 0 mercy 1 0 1 1 1 1 worser 1 0 1 1 1 0   Consider the number...
View Full Document

## This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online