{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

IR-part2

# 62 term document count matrices consider the number

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: erms in a collec*on are more informa*ve than frequent terms §༊  Jaccard doesn’t consider this informa*on §༊  We need a more sophis*cated way of normalizing for length §༊  Later in this lecture, we’ll use | A B | / | A B | . . . instead of |A ∩ B|/|A ∪ B| (Jaccard) for length normaliza*on. Introduc)on to Informa)on Retrieval Introduc*on to Informa(on Retrieval Scoring with the Jaccard coeﬃcient Introduc)on to Informa)on Retrieval Introduc*on to Informa(on Retrieval Term frequency weigh*ng Introduc)on to Informa)on Retrieval S ec. 6.2 Recall: Binary term- document incidence matrix Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony 1 1 0 0 0 1 Brutus 1 1 0 1 0 0 Caesar 1 1 0 1 1 1 Calpurnia 0 1 0 0 0 0 Cleopatra 1 0 0 0 0 0 mercy 1 0 1 1 1 1 worser 1 0 1 1 1 0 Each document is represented by a binary vector ∈ {0,1...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online